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Abstract. Representing the conditional independences present in a multivariate random vector via graphs 
has found widespread use in applications, and such representations are popularly known as graphical models 
or Markov random fields. These models have many useful properties, but their fundamental attractive feature 
is their ability to reflect conditional independences between blocks of variables through graph separation, a 
consequence of the equivalence of the pairwise, local and global Markov properties demonstrated by Pearl and 
Paz (1985). Modem day applications often necessitate working with either an infinite collection of variables 
(such as in a spatial-temporal field) or approximating a large high-dimensional finite stochastic system with 
an infinite-dimensional system. However, it is unclear whether the conditional independences present in an 
infinite-dimensional random vector or stochastic process can still be represented by separation criteria in an 
infinite graph. In light of the advantages of using graphs as tools to represent stochastic relationships, we 
undertake in this paper a general study of infinite graphical models. First, we demonstrate that naive extensions 
of the assumptions required for the finite case results do not yield equivalence of the Markov properties in 
the infinite-dimensional setting, thus calling for a more in-depth analysis. To this end, we proceed to derive 
general conditions which do allow representing the conditional independence in an infinite-dimensional random 
system by means of graphs, and our results render the result of Pearl and Paz as a special case of a more general 
phenomenon. We conclude by demonstrating the applicability of our theory through concrete examples of 
infinite-dimensional graphical models. 
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1. Introduction 

1.1. Background. Representing the eonditional independences present in a multivariate random veetor via 
graphs has found widespread use in applieations, and has also led to important theoretieal advanees in prob¬ 
ability. Such probabilistic Markov models are popularly known as graphical models or Markov random 
fields, and have seen rapid developmenf in recenf years. Though fhe inifial mofivafion for fheir developmenf 
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primarily stemmed from statistical physics ifl^ . they have become particularly relevant in modern applica¬ 
tions. In particular, graphical models are widely used to provide easily interpreted graphical representations 
of complex multivariate dependencies that are present in a large collection of random variables. Thus, 
they are very valuable for analyzing modern high-dimensional throughput data, and have become staples in 
contemporary statistics, computer science, and allied fields. 

In such models, the nodes of a graph correspond to random variables. The absence of an edge between a 
pair of nodes represents conditional independence between the corresponding two random variables given all 
the other variables, and is known as the pairwise Markov property. A particularly useful feature of graphical 
models is the equivalence of the so-called pairwise (P), local (L), and global (G) Markov properties. These 
Markov properties relate a graph = {V,E) and a collection of random variables (X,)/gy as follows: 

• (P) For ij e F, i 9 ^^ j implies A,- _LL Xj\Xv\{ij}. 

• (L) Given i £ F, A,- _LL Xv\c\(i) |Ane(;). 

• (G) Given A,B <ZV, and C C F separating A and B, we have A^ -LL Xb\Xc. 

Here we have used the following notation: 1) for A C F, A^ = {Xa)aeA, 2) i j means nodes i and j are 
adjacent in f#, 3) ne(/) is the neighbor set of i in ^ (so that ne(/) = {j £V \i j}), and 4) cl(/) = {/} Une(/). 

Whereas the pairwise Markov relationship represents independences between a pair of nodes, the global 
Markov property allows one to infer from the graph conditional independences between two blocks of 
variables, given a third block, if the first two blocks are “separated” by this third block in the graph. 

One of the cornerstones of the field of graphical models is fhe equivalence of fhese fhree Markov prop¬ 
erties for finile graphical models, which holds under relafively mild conditions and was firsl demonsfrafed 
by Pearl and Paz IITSl . This equivalence resulf provides a way fo relate fhe local and global condifional 
independence sfrucfure wifhin a collecfion of random variables, and esfablishes fhe value of fhe graphical 
represenfafion of fhese models. However, fhis equivalence befween fhe pairwise, local, and global Markov 
properfies does nof readily extend fo infinife collections of random variables. 

There are bofh fheorefical and pracfical reasons for sfudying infinite-dimensional graphical models. Firsf, 
the study of the limits of sequences of graphs has seen much development in recent years |[T^ l2l[3]l. yet the 
ability of these “infinite graphs” to represent multivariate dependencies in an infinite-dimensional random 
vector is not well understood. Second, modem applications have to contend with very high-dimensional 
data, and approximating such large finite systems with an infinite system has some clear practical advan¬ 
tages. For instance, by the same broad principles relevant when using general asymptotic or limiting ap¬ 
proximations to better understand or analyze finite models, an infinite-dimensional graphical model can also 
serve to better illustrate the salient features of a large finite counterpart. We expand on these and other 
compelling reasons for studying infinite-dimensional graphical models in subsection | 1 . 2 | 

The goal of this paper is therefore to understand the equivalence of the global, local, and pairwise Markov 
properties for an infinite collection of variables. We begin by first introducing a graphical framework which 
allows us to establish the equivalence of analogues of the pairwise, local, and global Markov properties for 
ternary relations on infinite graphs from a purely set-theoretic perspective (Section Q. We then develop 
the probability theory required to apply this graphical framework to the relation induced by conditional 
independence on a countable collection of random variables (Section [^. The set-theoretic framework is 
introduced prior to developing the relevant probability theory in order to separate the difficulties arising from 
infinite graphs versus infinite probability distributions. Finally, we demonstrate the broad applicability of 
our results through two general classes of graphical stochastic processes: a) Gaussian processes (Section|^, 
and b) discrete (i.e., {0, l}-valued) processes (Section|^. More specifically, we obtain sufficient conditions 
for the equivalence of the Markov properties in each of these contexts, and subsequently verify them on a 
collection of examples. These examples include, among others, autoregressive processes in the Gaussian 
setting, and an infinite-dimensional extension of the Ising model in the discrete setting. For these examples, 
we also actually verify one of the Markov properties, allowing us to immediately conclude the other two by 
equivalence, and thus establish the collection of variables as an infinite-dimensional graphical model. 
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The proofs of the various results contained in this paper are placed either in the main text, the Appendix, 
or the Supplemental section depending on the centrality of the result and so that they do not excessively 
detract from the flow in the main body of the paper. 

1.2. Motivation. One of the primary reasons for the popularity of graphical models in modern applications 
stems from their ability to facilitate the inference of conditional independences between large sets of vari¬ 
ables (as opposed to conditional independence between just two variables). This is achieved through the 
global Markov property by using separation statements in graphs, and relies fundamentally on the equiva¬ 
lence of the global, local, and pairwise Markov properties. The Pearl and Paz ifTSl result relating the various 
Markov properties in the finite setting has already been very useful for these purposes, but the more general 
infinite case has yet to receive a similar treatment. We provide a number of compelling reasons why such an 
investigation is important and long overdue, both from a theoretical and application perspective: 

(1) Due to the nature of modern data science and “Big Data” applications, the number of variables or 
features under consideration is becoming increasingly very large. In fact, the number of variables 
can easily be in the hundreds of thousands, or even many millions (e.g., gene expression data, EQTL 
high throughput data, single nucleotide polymorphism data, remote sensing data, high frequency 
trading data, etc.). The verification of the equivalence of the Markov properties by checking that 
the joint density is positive is simply not feasible for many modern applications for at least two 
reasons. First, note that the enormous number of variables is often accompanied by limited samples. 
This sample starved setting leads to extremely sparsely distributed data in very high-dimensional 
space. In such settings traditional frequency curves or histograms are not useful for describing the 
underlying data generating mechanisms or probability models. Second, the very large number of 
variables also means that the shape and support of each of the marginal densities cannot be easily 
checked, let alone the shape and support of the joint density. While the conditions for equivalence 
of the Markov properties for infinite collections of random variables established in this paper do 
imply the existence of a positive density, they may in some cases serve as a more directly verifiable 
safeguard for the very high-dimensional but still finite setting, thus still allowing one to invoke the 
global Markov property. 

(2) Many collections of random variables are by their very nature infinite-dimensional. For instance, 
many collections of random variables evolve in time and/or space with either a countable or un¬ 
countable index set. Though such processes may be observed only at finite times due to practical 
constraints, they are in principle actually infinite. At least in discrete examples of such settings (e.g., 
infinite lattice based models), an understanding of the conditional independences between variables 
that are present in the true underlying random process can only be achieved by examining the infinite 
analogues of the Markov properties. 

(3) Consider the case of the standard Gaussian graphical model. In the finite-dimensional setting it is 
well known that graphical structure corresponds to classes of covariance matrices with zeros in the 
inverse. The families of distributions of inferential interest here are thus Gaussian distributions pa¬ 
rameterized by sparse inverse covariance matrices. We shall show in this paper that in order to obtain 
graphical structure in the infinite setting, further conditions are required on infinite-dimensional co- 
variance matrices that parameterize Gaussian processes. In particular, our investigation into infinite¬ 
dimensional probabilistic models identifies classes of probability measures which enjoy graphical 
structure. Establishing and identifying classes of such infinite graphical models therefore serves the 
important purpose of laying the probabilistic foundations required for statistical inference. It spec¬ 
ifies clearly the families of measures on which inference is to be undertaken if one is interested in 
obtaining probabilistic data generating mechanisms which describe observed data in a parsimonious 
way. 

(4) Recall that theoretical guarantees of traditional statistical inferential methods are proven in the “clas¬ 
sical asymptotic regime” when the number of variables (or dimension) p remains fixed and the sam¬ 
ple size n tends to infinity. The advent of high throughput data, especially in the last two decades. 
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has seen inferential techniques for estimating or recovering a sparse graphical model in the “mixed 
asymptotic regime” when both sample size n and number of variables p tend to infinity. Very re¬ 
cently, a new class of methods for recovering sparse graphical models has been proposed in the 
“purely high-dimensional asymptotic regime” when the sample size n is actually fixed buf fhe di¬ 
mension p fends fo infinify ifTOl ini . This regime has been argued as fhe appropriafe regime for 
modern “Big Dafa” applicafions and corresponds fo fhe infinife-dimensional seffing. Esfablishing 
fheorefical safeguards of inferential procedures in fhe “purely high-dimensional” asympfofic regime, 
however, requires conditions on bofh fhe join! disfribufions as well as sparsify in parameters of in- 
feresf inattii. These fechnical conditions are mofivafed only by sfafisfical considerations, buf if 
is nol immediafely clear if fhese inferenfial procedures acfually yield frue probabilisfic graphical 
models in fhe infinite seffing (i.e., in which fhe Global Markov properly can be invoked). The work 
in Ihis paper Ihus also serves fo address Ibis gap by undertaking a probabilisfic Ireafmenl of infinite 
graphical models. 

(5) Anofher compelling motivation for our work is an imporfanf fechnical one. Recall fhaf fhe equiva¬ 
lence of fhe Markov properfies in fhe finile selling relies fundamenlally on fhe so-called intersection 
properly. We demonslrale in Ibis paper fhaf a naive or sfraighfforward exfension of fhe inlersec- 
lion properly fo infinile sefs, and fhe verificafion Ihereof, does nol ensure fhe equivalence of fhe 
global, local, and pairwise Markov properfies. Thus fhe ability fo infer condifional independences 
befween blocks of variables in fhe infinile seffing is simply nol guaranleed using assumptions from 
the finite-dimensional case. The verification of such assumptions may give the false impression that 
the global Markov property holds when it actually does not, motivating a rigorous treatment of the 
infinite-dimensional setting. 


2. Summary of Main Results 

We now provide an overview of the main results in the paper. Our first goal is to obtain equivalence 
of the pairwise, local, and global Markov properties for ternary relations. First, we provide the necessary 
definitions. 

Definition 2.1. Let V be any set. Let • _L • | • be a ternary relation on the power set The following 

properties of the relation are satisfied if Iheir respecfive slalemenls are frue for all (pofenfially infinile) sefs 
X,Y,Z,W G ^{V): 

• (PI*) Symmetry: X XY \Z implies T _L A | Z; 

• (P2*) Decomposition: X X{Y,W)\Z implies X _L T | Z and X _L W | Z; 

• (P3*) Weak Union: X _L (T, W) | Z implies X _L T | (Z, W); 

• (P4*) Contraction: X _L T | (Z, W) and X _L W | Z imply X _L (F, W) | Z; 

• (P5*) General Intersection: For any partition X = UkXt into finite subsels of V, if X^ _L F | ZU 

then X _L F I Z. 

Following fhe nomenclafure in iTfdll . we call fhe relafion • _L • | • a serni-graphoid relation on V provided 
if satisfies (PI*) - (P4*). If in addition fhe relation satisfies (P5*), we call if an extended graphoid relation. 

While (P1*)-(P4*) are essenlially fhe same as Iheir Iradifional finile counferparts which have already 
appeared in fhe lilerafure, fhe slalemenl of (P5*) is a less sfraighfforward generalizalion of fhe usual inler- 
seclion properly fo fhe infinile selling. We also nole fhaf fhe cardinalily of fhe various sefs in fhe slalemenls 
of (P1*)-(P5*) are bounded only by |V|, i.e., if |V| is uncounfably infinite Ihen fhe various sefs X,F,Z,VF 
may also be. 

Definition 2.2. If V is any set, ^ is any undirected graph on V, and • _L • | • is any ternary relation on .^(V), 

we say that {V, _L) satisfies the pairwise (P*), local (L*), or global (G*) set-Markov properties with respect 

to ^ provided that 

• (P*) for each pair of vertices {i,j} C V, 

i j ^ {i} {j} I V\{i,jy, 
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• (L*) for each vertex i, 

{/} _L V \cl(/) I ne(/); 

• (G*) for each triple (A,B,S) of disjoint subsets of V with S separating A from B, we have 

A _L B I 5. 

We say that (P*), (L*), or (G*) holds over ^ when (V, _L) is clear from context. 

We prove the following theorem, which in particular establishes the equivalence of the set-Markov prop¬ 
erties under (P1*)-(P5*), and in the case of finite V reduces to the foundational result of Pearl and Paz 

m. 


Theorem 2.3. Suppose V is any set, and • _L • | • satisfies (PI*) - (P4*). Then 

{V, _L) satisfies (G*) => {V, _L) satisfies (L*) =► (P, _L) satisfies (P*)- 
If in addition, • _L • | • satisfies (P5*), then we also have 

(P,±) satisfies (P*) => (P^-L) satisfies (G*). 


After establishing the set theoretic framework outlined above, we apply Theorem 2.3 to the ternary rela¬ 
tion induced by conditional independences present in a countably infinite collection of random variables. 


Corollary 2.4. Suppose that A a collection of random variables satisfying property (P5*). Then 

for the ternary relation ■ _LLx 'I ■ on N induced by the conditional independences in the collection of random 
variables 

(N,_LLx) satisfies (G*) (N,_LLx) satisfies (L*) (N,_LLx) satisfies (P*). 


The above corollary establishes the importance of (P5*), but leaves its verification untouched. Our next 
goal is to establish the equivalence of the various Markov properties for a wide class of examples of infinite 
collections of random variables by verifying (P5*). To this end, we introduce in the main text sufficient 
conditions for (P5*) to hold for a general collection of random variables (we give these sufficient conditions 
the names “infinite intersection property” (IIP) and “decorrelation property” (DCP); see Definition 5.3 for 
their statements). We then verify these conditions for two important and widely used classes of models: 
countably infinite Gaussian processes, and countably infinite discrete processes. Our primary results in this 
regard are given below. 


Theorem 2.5. Let be a Gaussian process with covariance matrix £ which satisfies the following 

bounds: 


(1) There exist constants c and C such that for any finite subset of nodes A, we have 


0 < c < A;(£a) < c, 

for all i, where A, (r) represents the largest eigenvalue of the matrix £. 
(2) There exists a function go{i,j) which bounds the covariances 

\Cow{Xi,Xj)\<go{i,j) \/i,j 

and if we recursively define 


oo 

gn+l {i, i) = gn{i, j) + Y,Sn (h k)gn{k, j ), 


k=\ 


then for n <A, gnii,j) exists and is finite, and moreover 


Y^gn{i,k)lA < 


k=\ 


oo 


for some £ > 0 and all i. 
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Then the ternary relation ■ _LLx • | • satisfy the infinite intersection property (UP) and the decorrelation 
property (DCP), and hence (P5*). 


Remark 2.6. We comment briefly on the intuitive interpretation of conditions (1) and (2) above. The eigen¬ 
value condition (1) ensures sufficient level of nonsingularity of the distribution that it is possible to obtain 
a positive density, which will be important for verifying (IIP). Condition (2) can be interpreted as a “decor¬ 
relation” condition, as it forces |Cov(W,2f;)| to decay sufficiently fast as Xi and Xj become further apart, 
and will be important for verifying (DCP). One commonly used rich class of models for which this second 
condition holds is the lattice model with the exponential or Gaussian covariance function |[22l . Full details 
for the Gaussian covariance function are included in Example |6.9| 


Remark 2.7. We note that the conditions on the gn from (2) in Theorem 2.5 are conceptually similar to 
“decorrelation” conditions required for other probabilistic results for dependent random variables. As a first 
example, the central limit theorem for dependent variables requires 


lim -££cov(A;,Ay) = 7 

for some finite constant y |[T5]I . This condition is similar to that on the from Theorem 2.5 in that both 
ultimately imply a certain level of decay in the covariances of the variables in question. A second example 
of a useful decorrelation condition is the often imposed assumption of absolute summability of the autocor¬ 
relations in a covariance-stationary sequence of random variables: 


£|Cov(Xo,X,)|:=Xl7/t|<-. 

k=0 k=0 

This condition can be used to obtain a weak law of large numbers fJl, and is actually implied by the second 
part of condition (2) in Theorem |2. 5 1 


Theorem 2.8. Suppose the random variables {A„ | n G N}form a {0, l}-valued stochastic process satisfying 
the following conditions: 

(1) For any finite increasing sequences of natural numbers 1 = (/i,...,/„j) and J = there is 

a constant cj depending only on I such that for any two {0, l}-valued sequences (ai, ■..,a,n) and 
{b\,...,br), we have 

P((X/j,...,A,„) = {ai,...,am) I Xj^ =bi,...,Xp =br) > cj a.s. Xj 

(2) Let n G N and B CN be arbitrary, and let ^b.-u = ...). For any m G N and ¥-a.e. 

value ofXg = xb, there is a function gm.B.XB{^) satisfying 

• limn^oogmp^xein) =0 

• For any A G ^b-h, 

( 2 . 1 ) Y3x{¥{A\X^,...,Xn,XB)\XB=XB)<gm,B,xs{n). 

Then the {X„ | n G N} satisfy the infinite intersection property (UP) and the decorrelation property (DCP), 
and hence (P5*). 


Remark 2.9. The quantity P(A|Ai, ...,A„j,Xb) is a random variable depending on Xi,...,Xm and Xb, so the 
conditional variance from line (|2.1|) is a sensible quantity. 


Remark 2.10. The conditions (1) and (2) from Theorem |2.8| serve a similar role to the conditions from 
Theorem |2.5| In particular, condition (1) again ensures “nonsingularity” of the distribution, permitting the 


existence of a positive density. Also, condition (2) corresponds once again to a quantitative formulation 
of a “decorrelation” condition. More specifically, it requires that if after conditioning on a set of variables 
Xb an event A only depends on variables sufficiently far from Ai, ...,X,„, then A is “nearly independent” of 
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For each of the two theorems above we provide multiple examples for which the respective required 
conditions are satisfied. These examples will serve to demonstrate the flexibility of our extended framework 
for the study of infinite-dimensional graphical models. 

3. Preliminaries 


3.1. Graph Theory. 

Definition 3.1. An undirected graph is a pair of objects iy,E), where V is the set of vertices of and 
£■ is a subset of F x V containing the edges. By convention, E does not contain edges of the form (/, i) or 
multiple edges between any two vertices, and we do not distinguish between the edges {ij) and We 
now introduce a number of additional definitions related to undirected graphs. 

• Let i be a vertex in V. A vertex j is called adjacent to i, or a neighbor of i, if {i,j) G E. If j is 
a neighbor of i, we write i j, and otherwise we write i 9 ^^ j. The set of all neighbors of i is 
denoted by ne(/). We also define the closure set cl(/) := {/} Une(/). 

• Given a graph ^ = {V,E) and a subset A C V, the induced subgraph on A is defined to be the graph 
(A,Fn(AxA)). 

• The degree of a vertex / G F is the cardinality of the neighbor set of i. That is, the degree of a vertex 
i is equal to 

deg(/) = |{7 GF I / ~ 7 }|. 

• Let A,B,C C F be three nonempty subsets of F. We say that C separates A from B if every path 
from a vertex a G A to a vertex b & B contains some vertex in C. 


3.2. Conditional Independence. 

Definition 3.2. Suppose 2f,T, and Z are random variables over the probability space . Suppose also that 
the joint probability distribution of {X,Y,Z) has a density / with respect to some underlying measure p on 
the range of (X, T,Z). Then the variables X and Y are said to be conditionally independent given Z, denoted 
X 11 F I Z, if and only if there is a factorization 

f{X,Y)\z{x,y,z) = fx\z{x,z)fY\z{y,z) 

which holds for a.e. x, y, and z. See [4] for more details. 


Consider a collection of random variables {X,, | v G F}. Given a subset A C F let Xa denote the random 
vector (Xv)vgA> and given subsets A,B,C C F, with a slight abuse of notation we shall write A 11 B | C to 
mean X .4 11 Xg | Xc. 

Now, suppose that A,B,C, and D are disjoint collections of random variables. Then we may consider the 
following properties, known as the axioms of conditional independence: 

• (PI) Symmetry: A 11B | C implies B 11A | C; 

• (P2) Decomposition: A 11 (B,C) | D implies A 11B | D and A 11 C | D; 

• (P3) Weak Union: A 11 (B,C) | D implies A 11 B | (C,D); 

• (P4) Contraction: A 11B | {C,D) and A 11 C | D implies A 11 (B, C) | D. 

• (P5) Intersection: A 11 B | {C,D) and A 11 C | {B,D) implies A 11 (B,C) | D. 

We note that properties (PI) - (P4) can be proved for arbitrary collections of random variables (see 
Lemma [5^ below). Property (P5) does not hold under such extreme generality, but the following result does 
hold (as verified in lfT4]l . for example). 


Proposition 3.3 (lfT4ll Suppose for all v G F thatXy is a random variable. If every finite subcollection of the 
Xy has a joint density which is everywhere positive, then (PI) - (P5) hold for all finite subsets A,B,C,D C 
{Xv I V G F}. 


Remark 3.4. In definition 2.1 we used the * in the labeling of properties (PI*) - (P5*) and 1 instead of 11 
to distinguish those properties as abstract properties of a ternary relation. This is as opposed to (PI) - (P5) 
above which are specifically related to conditional independence. 
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Remark 3.5. The only significant difference between (P1)-(P5) and (P1*)-(P5*) occurs in the statement 
of (P5*). As we will show in Proposition B.l (P5) and (P5*) are equivalent when considering a finite 
collection of random variables. However, the more nuanced property (P5*) is necessary in order to obtain 
probability theoretic results for infinite collections of random variables, which cannot be obtained from (P5) 
alone. 


3.3. Markov Properties and Graphical Models. Suppose now that ^ = (P,^) is an undirected graph 
with |P| < oo. Let X = (Xi, ...,Xn/|) be a |P|-variate random variable. Then we say that X satisfies the (P) 
pairwise, (L) local, or (G) global Markov properties with respect to the graph provided that: 

• (P) for each pair (/,y) of vertices, 

i j (i.e., i and j not adjacent) X, 11 Xj \ Xy\{,- q; 

• (L) for each vertex i, 

-LL I , 

• (G) for each triple (A,5,S) of disjoint subsets of V with S separating A and B, we have 

X^^Xb |Xs. 

The following foundational result of Pearl and Paz ifT^ shows that, under general circumstances, these 
three Markov properties are equivalent. 

Theorem 3.6 (Pearl and Paz, 1985 1181 1. Suppose V is a finite set, and that X is a collection of random 
variables indexed by V such that for any disjoint non-empty subsets A,B,C,D <fV the intersection property 

(P5) Xa ^ Xb I XcuD and Xa ^ Xc | xbud ^XaU. Xbuc | Xd 
holds. Then with respect to a given graph 

X satisfies (P) 41X satisfies (L) 41X satisfies (G) . 


4. Graphoid Relations and Separation for Infinite Graphs 


We now introduce a graphical framework in which we can analyze the Markov properties from a purely set 
theoretic perspective which is independent of probability theory. Making this distinction in our mathematical 
treatment will allow us to understand better what drives the equivalence of the Markov properties. 


4.1. Extended Graphoids and Semi-Graphoids. Conditions (PI*) - (P4*) are precisely the same as (PI) 
- (P4) when the relation in question is that induced by conditional independence (as in Definition |3.2[ ). While 
the statements of (P5*) and (P5) are different, (P5*) is in fact a generalization of (P5) which is equivalent 
when V is finite, a result stated formally in Proposition |B]^ in the Supplemental section. 

We now recall Definition |2.2| of the set-theoretic analogues of the Markov properties for ternary relations, 
which we will ultimately consider for semi-graphoid and extended graphoid relations. 


Definition. If V is any set, is any undirected graph on V, and • _L • | • is any ternary relation on 
we say that (V, _L) satisfies the pairwise (P*), local (L*), or global (G*) set-Markov properties with respect 
to ^ provided that 

• (P*) for each pair of vertices {i,j} C V, we have i 9 ^^ j {/} _L {y} | P \ {/,y}; 

• (L*) for each vertex i, we have {/} _L P \cl(/) | ne(/); 

• (G*) for each triple (A,B,S) of disjoint subsets of P with S separating A from B, we have A _L B | S. 
We say that (P*), (L*), or (G*) holds over when (P, ±) is clear from context. 


Note that if P is finite, and • _L • | • is the conditional independence relation from Definition 3.2 these 
set-Markov properties are precisely the usual probabilistic Markov properties. That is, (P) = (P*), (L) = 
(L*), and (G) = (G*). 

We now prove Theorem |2.3[ which asserts the equivalence of the set-theoretic Markov properties under 
(P1*)-(P5*). 
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Theorem. Suppose V is any set, and • _L • | • satisfies (PI*) - (P4*). Then 

{V, _L) satisfies (G*) ^ (V, _L) satisfies (L*) (V, _L) satisfies (P*)- 

If in addition • _L • | • satisfies (P5*), then we also have 

(y, _L) satisfies (P*) (y, _L) satisfies (G*). 

Proof. 

(G*) ^ (L*): 

For any subset A C V, the neighbor set ne(A) separates A and V \ (AU 7 ^ (A)), so (G*) trivially implies (L*). 
(L*) ^ (P*): 

Let i,j be any two vertices in V which are not adjacent. Then j 0 cl(/). By (L*), i _L y \cl(/) | ne(/). Since 
j 0 cl(/), we may conclude from (P3*) that / _L 7 | y \ {i,j}, proving (P*). 


(P*) ^ (G*): 

Suppose that AjBjS C y are such that S separates A and B. Let A be the set of vertices in V that can be reached 
by a path starting at some vertex in A and which does not include any vertex in S. Define B = y \ (A U S), 
noting that S separates A and B, and that B C B. 

Fix a vertex i G A. Then, since i is separated from B by S, we know in particular that for any j G B, i 
and j are not adjacent. Therefore, by (P*), we have / _L 7 | y \ {/,7}, and we also have that V \ {i,j} = 
(A \ {/}) U SU (B \ {7} ). Since this holds for all 7 in B, we have by property (P5*) that / _L B | (A \ /) U S, and 
a second application of (P5*) gives A _L B | S. 

Finally, since A C A and B C B, two applications of property (P2*) allow us to conclude A _L B | S, 
finishing fhe proof fhaf (P*) (G*). □ 

Remark 4.1. We nofe explicifly fhaf Theorem |2.3| is valid for V of arbifrary (even uncounfable) cardinality. 
Moreover, allhough Theorem |2.3| deals wilh V of arbifrary cardinalify, if is nof necessary lo invoke fhe axiom 
of choice lo obfain fhe resull. 


We are now in a position fo deduce fhe resull of Pearl and Paz lITSl formulaled in lerms of sef-Markov 
properlies as a special case of Theorem |2 .3 1 above. 


Corollary 4.2 (Pearl and Paz, 1985 iflSl l. Suppose V is a finite set and ■ T - \ ■ is a ternary relation on 
satisfying (PI *)-(P4*), and is such that for any disjoint non-empty subsets A,B,C,D C V the intersection 
property 

(P5)A±B\ CGDandATCl BUD^ATBUC | D 


holds. Then 


• _L • I • satisfies (P) • _L • | • satisfies (L) • _L • | • satisfies (G) . 


Proof. By Proposition |B.1[ fhe assumption of (P5) is equivalenl lo fhaf of (P5*). Thus, we may apply 
Theorem 12. 3 1 □ 


Remark 4.3. The only imporlanl difference belween fhe proof of Theorem 2.3 and fhe proof of Corollary 
|4.2| from lITSl is in fhe proof of (P*) (G*). For Ihis sfep, lH^ relies on fhe pairwise sfalemenf of (P5) and 

a reverse inducfion argumenl slarling wilh fhe largesl separalor for which fhe claim fails. Our proof instead 
appeals lo fhe specific way in which (P5*) is formulated. Combining fhe equivalence of (P5) and (P5*) from 
Proposition |B. 1 1 wilh fhe argumenl we used lo prove Theorem 2.3 would provide an alfemale proof of fhe 
equivalence of fhe usual finite, probabilisfic Markov properties. 


One could ask fhe queslion of whefher fhe general inlerseclion properly (P5*) is fhe minimal assumption 
required lo gel equivalence of fhe sel-Markov properlies. We will now show fhaf under (PI*) - (P4*), fhe 
assumption (P5*) is equivalenl lo fhe slafemenl (P*) (G*). 
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Proposition 4.4. If (PI *)-(P4*) hold, and (P*) ^ (G*), then (P5*) must hold. 

Proof. Suppose that (P*) implies (G*), and that for some I GV, and all i G I, we have 

Then let be the graph on V with one node corresponding to each of the Xi, and also nodes for Y and Z, 
and let the edge set be chosen so that there is an edge between two nodes precisely if the (P*) condition for 
those two nodes holds. That is, if a and b are two nodes, then {a,b) £ E if and only if a _L | P \ {a,b}. 
Then for this choice of E, (P*) trivially holds, and since we are assuming (P*) implies (G*), we also have 
that (G*) holds. By hypothesis, there are no edges from any Z, to F, and so Z separates Xj and F. Therefore 
by (G*) we may conclude Z/ _L F | Z. □ 

The above proposition shows that (P5*) is a necessary property to obtain the equivalence of the set- 
Markov properties, and so is indeed minimal. Importantly, this affirms the notion that (P5*) is the correct 
generalization of (P5). 


5. Markov Properties for Infinite Sets of Random Variables 

Our goal in this section is to obtain general conditions under which (P5*) holds for the conditional inde¬ 
pendence relation. This will allow us to verify the equivalence of the infinite probabilistic Markov properties. 

5.1. Axioms of Conditional Probability. The first step toward this goal requires us to introduce the most 


general definition of conditional independence, significantly generalizing Definition (3.2 1 . 

Definition 5.1. Given collections of random variables A, B, and C, we say that 

A_LLB I C 

provided the a-algebras cj(A), o(B), and cj(C) satisfy 

(5.1) P(£i|a(C))P(£2|a(C)) =P(£in£2|a(C)) 

for all events Ei £ CJ(A) and E 2 £ o{B). 

It turns out that properties (P1*)-(P4*) hold for any collection of random variables, even for this more 
general form of the conditional independence relation. 

Lemma 5.2. If (Av)vgy is any collection of random variables with V at most countably infinite, then (PI *)— 
(P4*) hold with respect to the conditional independence relation • _LL • | 

Proof. The proof follows from measure theoretic arguments. Similar results are commonly referenced in 
the literature, and some proofs can be found, for example, in |l5l[l3- However, we have been unable to find 
a reference thaf lisfs each of the properties (P1*)-(P4*) at our level of generality. Thus, we include our own 
proofs in the Supplemental section for the sake of completeness. □ 

In the next section, we will consider the generality in which (P5*) holds. 

5.2. Sufficient Conditions for the General Intersection Property. When the vertex set V is finite, there 
are well-known sufficient conditions for the intersection property (P5) to hold (see lfT9l for an approach 
involving densities and product measures, and see |!5l for a more general a-algebra oriented approach). In 
particular, (P5) holds if the random variables have a joint density which is positive everywhere. However, 
in the infinite setting, sufficient conditions for the general intersection property (P5*) to hold are not im¬ 
mediately evident. We address this by introducing a set of sufficient conditions under which the general 
intersection property holds for the conditional independence relation. 
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Definition 5.3. Let {Xi \ / G N} be a collection of real-valued random variables. The Xi are said to satisfy 
the Infinite Intersection Property (IIP) and Decorrelation Property (DCP) respectively provided that: 

(IIP) Given any (possibly infinite) subset D C N, and any finite A,B,C C N \ D, we have that 
{Xa ^ Xb I Xc,Xz) and Xa ^ Xc | Xb,Xd) ^XaIX {Xb,Xc) \ Xd- 


(DCP) Given any (potentially infinite) subset D C N, and any event 

L" G o{X d , Xfj , Xfi^i , Xfi-f-2 ;•■•)) 
there exists an event E' G o{Xd) such that P(£'A£'') = 0. 

Remark 5.4. In the finite-dimensional setting, the infinite intersection property (IIP) reduces precisely to 
(P5), since in that case all subsets A,B,C, and D under consideration will be finite. In addition, the decorre¬ 
lation property (DCP) holds trivially in the finite-dimensional setting since, in that case, 

P\n^i^n i^n+l j ■• •) — ^ (^o) • 

Thus, the assumption of both the infinite intersection property (IIP) and decorrelation property (DCP) re¬ 
duces precisely to (P5) in the finite-dimensional setting. As we shall see, (IIP) and (DCP) are also sufficient 
to obtain (P5*) in the infinite-dimensional setting. 

Remark 5.5. One can think of (IIP) as an intermediate infinite extension of (P5) which is necessary for, but 
not quite as strong as (P5*), and hence is more readily verifiable. (DCP) is not a necessary condition for 
(P5*), but provides a quantitative formulation of the idea that the infinite collection of variables is sufficiently 
“decorrelated,” which in turn allows the verification of (P5*). 

Remark 5.6. It may appear that verifying (DCP) for D = 0 implies (DCP) for all other possible choices of 
D, with the key step in such an argument being 

{Xd iXfi.jXfi^i ,...) = CJ (X/), n/jC (XfnXfi-\-i ,...)). 

However, the above expression does not hold in general, even if equality is weakened to equivalence, in the 
sense that two a-algebras and are equivalent if for any A G there exists B G with P(AAB) = 0, 
and vice versa (some conditions under which such an equivalence holds are listed in ll23]l l. Because this 
equality does not always hold, it is indeed necessary to verify (DCP) for all subsets D C N. 

Theorem 5.7. Suppose that {X„ | n G N} A a collection of random variables satisfying properties (IIP) and 
(DCP). Then the ternary relation ■ _LLx • | • corresponding to conditional independence satisfies (P5*). 

Proof Let /,D C N be arbitrary (potentially infinite) subsets, let C C N be finite, and suppose that X,- 11 
Xc I (Xy)^g/\{/|,Xo for all i G I. By iterative application of property (IIP) we may conclude that for any finite 
J Cl, 

XjlEXc\Xpj,Xo. 

We may therefore assume that I is infinite, since the above verifies (P5*) when I is finite. 

Let I = {/i, 12 , 13 , •••}, and let I„ = {/„, L+i, L+ 2 ,Applying property (P2) gives 

Xj1EXc\Xj\k,Xd 

for any K satisfying J CK Cl, and so for all sufficiently large n, we have 

Xj 1 CXc\Xi,^,Xd. 

Now, let A be an event in o(Xj) and B an event in o{Xc). Then by the backward martingale convergence 
theorem (for a reference, see e.g. iflAll ). it holds pointwise that 

limP(A|X;„,Xz))=P(A|,^), 

n^oo 

lim P(B|X/ ,Xd) = and 

/7—>oo " 
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limP(AnB|X/„,XD) =P(AnB|^; 


where ^ = r\„o{Xj^,XD). Thus, since the independence factorization holds almost surely for all sufficiently 
large n, we have that almost surely F{A\^)F{B\^) = F{Af\B\.^). 

Now, note that o{Xd) C and by property (DCP), for any E G there exists E' G o{Xd) satisfying 
F{EXE') = 0. We may therefore apply Lemma A.4[ and obtain that F{A\^) = P(A|W)), and similarly for 
B and ACiB, and so we have A ±L B \ Xq. 

Since A and B were arbitrary elements of a (A/) and a (Ac) respectively, we have that Xj 1 LXc\Xd for 
all finite J EL 

Now, let E G o{Xi). 

F{EAE„) 

Then 


By Proposition A.6 there is a sequence of sets E„ G G{Xi^,...,XiJ such that 
0. Since G g(Xj) for a finite 7 C /, we have _LL Ac | X/j. Let E G Cj(Ac) be arbitrary. 


(5.2) F(E„IXn)F(ElXD) -F(E„nElXD) = 0 

for all n almost surely. Since F(EAE„) —)• 0, we have that E[1 £:ae„] —^ 0. So E[E[1 cac^|A/)]] 0, and since 

IE[1£A£„|Ad] = E(£'A£'„|Ao) > 0, we may conclude that E(£'A£'„|Ao) —)> 0 almost surely. Thus, since 

|E(E|Az))-P(E„|Az))| < |P(E\E„|Az))| + |P(E„\E|Az))| = |P(EAE„|Az))|, 


we may conclude that P(£'„|Ad) —)> P(£'|Ao) almost surely. A similar argument works to show that P(£'„ H 
EjXo) —)• P(£'nf |A/)) almost surely, and therefore we may obtain from line (|5.2|) that almost surely. 


P(E|Az5)P(F|AD)-P(EnF|AD)=0 


This finishes the proof that A/ 11 Ac | Aq for finite C. 

Suppose now that C is infinite. Then for any finite subset C' C C, we have A 11 Ac' | (Xj)j^j\-[ij,X£), and 
so the above arguments show that A/ 11 Ac' | Ao for any finite C' C C. Note that {E G Cj(Ac) | E G 
a(Ac') for some finite C' C C} is a Tl-system, and that for any event Ej G o{Xj), the set of events E G Cj(Ac) 
satisfying 


(5.3) 


P(E/|Az))P(E|Az)) = F{Eir\E \ Xd) 


forms a A-system containing We may then conclude by Dynkin’s K-X theorem that line (^1 is satisfied 
for all Ej G CJ(A/) and all E G cj(‘^) = g{Xc), and so A/ 11 Ac | Xo even if C is infinite. Thus we have 
demonstrated (P5*). □ 


The following result provides a convenient way to verify (DCP) in practice. 

Lemma 5.8. Let (A„)„gpj be a collection of random variables satisfying the following: 

Eor any D C N, let = rt(Ao,A„,A„_|_i,...). Eor any m G N, and F-a.e. value ofXo = xd, there is a 

function gm,D,XDi^) satisfying 

(1) lim„^oogm,D,;to(?i) = 0, and 

(2) Eor any A G ^D.-n we have 

(5.4) Var(P(A|Ai,...,A,„,Az))|AD =xd) <gm,D,xo{n) a.s.. 

Then the (A„)„gN satisfy property (DCP). 

More strongly, for (DCP) to hold, it is sufficient that inequality ( |5.4| ) be valid for all A G ^D.-n-N (for 
all N > n), where 

’^D-n.-N ■= rt(Ao,A„,A„+i, ...,Aiv). 

Proof Let D C N andA G be arbitrary. By assumption, 

Var(P(A|Ai,...,A„„A£,)|AD =xd) <gm,D,xo{n) a.s. Xd Vn, 

and thus 

Var(P(A|Ai,...,A„,AD)|AD =xz)) =0a.s. Xd- 
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Thus, we have that P(A|Xi, ...,X,„,Xd) is almost surely just a function of X^- That is, 

P(A|Xi,...,X,„,Xo) = P(A|Xd) 


almost surely for all m. By the Levy zero-one law, we may conclude 

\a = lim P(A|Xz),Xi,..,X,„) =P(A|Xd) 

almost surely, so there is some A' G (J{Xd) such that 1^ = 1^' almost surely, or equivalently, P(AAA') = 0. 
Thus (DCP) holds. 

Now, suppose instead that we only assume 


Var{F{E\Xi, ...,X,n,XD)\XD = xd) < gm,D,xo{n) a.s. Xd 

for those E contained in ^o.-n-N = for some n,N. The algebra given hy = 

UA?a(X£),X„,A'„+i,...,X a?) generates and so by Proposition A.6 for all n G N,e > 0, there exists 

some A G N and an event A„ e G ’^D.-n-N such that P(AAA„.e) < e. Thus, in particular, we have 1 a,. ^ —>• 1a 
almost surely. 

By the dominated convergence theorem, we may thus conclude 


Var(P(A„,e|Xi,...,A,„,Az))|Xz, =xd) ^ Var(P(A|Xi,...,A,„,AD)|XD =Ac), 
and therefore that 


Var(P(A|Xi,...,A,„,Az,)|XD =xc) 

for all n (since A G Then we may follow the above argument to obtain the existence of A' G o{Xd) 

such that P(AAA') = 0, and conclude that (DCP) holds. □ 


5.3. Analysis of Properties (IIP), (DCP), and (P5*). There are a few natural questions that immediately 
arise regarding the technical conditions required to obtain the equivalence of the Markov properties. In this 
section, we consider the following three: 

• Is a more straightforward extension of the intersection property (as given by (P5) in the finite setting) 
sufficient to obtain the equivalence of the various Markov properties in the infinite setting? 

• Does the infinite intersection property (IIP) imply the decorrelation property (DCP) or vice versa? 

• Is the infinite intersection property (IIP) or the decorrelation property (DCP) in any sense a necessary 
condition for (P5*)? 


We begin by addressing the first question. One may consider the following naive extension of (P5) to 
the infinite case: assume that (P5) holds for all finite subcollections of the infinite set of variables. This 
assumption is perhaps the most natural way of extending (P5), and is strictly weaker than the infinite inter¬ 
section property (IIP) (which itself can be viewed as an extension of (P5)). However, this assumption is not 
sufficient to obtain the equivalence of the Markov properties, as shown in Example |B. 3 1 in the Supplemental 
Section. In fact, we show in Example |B. 3 [ that even adding the decorrelation property (DCP) to this naive 
extension of (P5) is not sufficient to obtain the equivalence of the Markov properties. This demonstrates the 
ineffectiveness of the naive extension of (P5). The infinite intersection property (IIP) augments this exten¬ 
sion of (P5) only by allowing the conditioning set to be infinite, and in light of Theorem 5.7 can be viewed 
as the correct extension of property (P5). 

We now consider the question of whether (DCP) implies (IIP) or vice versa. Eirst note that, as mentioned 
in Remark |5.4| above, in the finite setting the decorrelation property (DCP) holds trivially, and the infinite 
intersection property (IIP) is equivalent to the usual intersection property (P5). Since (P5) does not hold 
in general even in the finite case, this shows that the decorrelation property (DCP) alone is not a sufficient 
condition for (P5*), and also that (DCP) does not imply (IIP). The previously mentioned Example |B. 3 [ also 
provides a less trivial example where the decorrelation condition (DCP) is satisfied and where (P5*) is nof. 
This shows fhaf fhe decorrelafion properfy (DCP) by ifself does nof imply fhe infinite infersecfion properly 
(IIP) nor fhe general infersecfion properfy (P5*). 

Nexl, we address whefher (IIP) implies (DCP). In facl, (IIP) does nof imply (DCP), and moreover (IIP) 
is nof a sufficienl condilion for (P5*) by ilself. We demonslrafe Ihis in Example |B.4|in fhe Supplemenlal 
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Section, in which we provide a collection of random variables which satisfies (IIP), but which does not 
satisfy (DCP), and for which the equivalence of the Markov properties does not hold. This shows that 
(IIP) by itself is not enough to imply (P5*), and thus demonstrates the importance of also verifying (DCP) 
when one wishes to appeal to the equivalence of the Markov properties for an infinite collection of random 
variables. 

The question that remains is whether the properties (IIP) and/or (DCP) are necessary for (P5*). It is easy 
to see that (IIP) is directly implied by (P5*), so that (IIP) is indeed a necessary condition for the equivalence 
of the Markov properties. However, the theoretical relationship between (P5*) and (DCP) is less clear. 

As demonstrated by Theorem |5. 7 [ (DCP) is a convenient criterion which can be used with (IIP) to verify 
the equivalence of the Markov properties. In addition, for many applications of interest, (DCP) is satisfied 
due fo fhe “decorrelafion” of fhe random variables (see Remark [53] ). However, we demonsfrafe in Example 
B.5| of fhe Supplemenfal Section fhaf (DCP) is nol a necessary condifion for (P5*). Despite fhis, fhe fol¬ 
lowing proposition gives a partial converse fo Theorem 5.7 In parficular, fhis proposition provides some 


sellings under which some of fhe implicafions required by fhe definition of (DCP) are implied by (P5*), Ihus 
providing more justification lhal (DCP) is a reasonable condifion to impose. 


Proposition 5.9. Suppose that (7f„)"^j is a collection of random variables for which (P5*) holds, and that 
D C N and H = ...} C N are such that in the graph on N induced by (P*), there are no paths 

between distinct elements of PI that do not pass through D. Then for any event E £ r\nO{Xo,Xh„,Xi ^^^^,...), 
there exists E' £ o{Xd) such that F{EAE') = 0. 


Proof Since (P5*) holds, (P*) is equivalent to (G*), and so the condition on the elements of H relative to 
the (P*)-induced graph implies that for any hi,hj £ // we have X^. 11 X/y \ Xo- 

Now, suppose that E £ ,...). Then E £ a(X£),X/,j,A/, 2 ,...), so it is the case that \e = 

lim,„_^coP(£' I Xd,Xih, ...,Xh^). But, by the conditional independences A/,, 11 A/,^ | Xq, it is the case 
that (A/,,,...,A/,J 11 (A/,^^j,A/,„^ 2 >"-) I ^D, and so since E £ a(A/,^^j,...), we have for all n the equal¬ 
ity P(£' I A£),A/,j,A/, 2 ,...,A/,„) = P(£' I Xd) with probability one. Thus, with probability one, we have 
\e = !?’(£' I Xd). Thus, if E' = {P(£' | Xq) = 1}, then E' £ o{Xd) and P(£'A£'') = 0, concluding the 
proof. □ 

Remark 5.10. The infinite star graph is an example of the kind of graph to which Proposition |5.9| could be 
applied. For this example, we could take D to be the hub node, and H to contain all of the remaining nodes. 


6. Graphical Gaussian Processes 

Two of the most important classes of graphical models that have been studied in the literature are Gauss¬ 
ian graphical models and discrete log-linear models. In this section and the one that follows, we give a 
comprehensive treatment of their countably infinite analogues and important special cases thereof. 

6.1. Infinite-Dimensional Gaussian Graphical Models and Properties. We begin by introducing some 
definitions and notation. 

Definition 6.1. 

• A Gaussian process is a collection of random variables {Xi)i^i such that each A, is a normal random 
variable, and every finite collection of the A, has a multivariate normal distribution. We will assume 
in this paper that in a Gaussian process, all of the random variables have mean zero. For any subset 
A C /, we use Xa to denote the random variable {Xa)aeA- 

• For A,B C /, we use Lab to represent the cross-covariance matrix of Xa and A^. We also write La to 
represent Laa- 

• If A,B C / are finite and disjoint, we use La\b to represent the covariance matrix of Xa given Ag. It 
is well known that when Lb is invertible, 

^A\B = — ^AB^e '^BA- 
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• For a finite-dimensional matrix M, we use the notation Xi{M) to represent the largest eigenvalue 
ofM. 


We now introduce a theorem which is a critical ingredient in the verification of (IIP) and (DCP) for 
Gaussian processes. 


Theorem 6.2. Let be a Gaussian process which satisfies bounds (1) and (2) given in the statement 

of Theorem \2.5\ Then for any finite A C N of size r and any (possibly infinite) B QN disjoint from A, there 
is a o{Xb)- measurable function and an rx r symmetric, positive-definite matrix such that 

Proof. This theorem is central to ensuing arguments, but due to its technical nature and length, the proof is 
provided in Appendix]^ □ 

Lemma 6.3. Under the assumptions of Theorem \2.5\ the Gaussian process satisfies (IIP). 

Proof. The proof follows by exploiting the existence of a density using standard techniques. Details are 
provided in the Supplemental section. □ 


Remark 6.4. The proof of Lemma 6.3 amounts to verifying an only slightly extended version of the finite¬ 
dimensional intersection property (P5), since the only difference between (P5) and (IIP) lies in potentially 
conditioning on an infinite-dimensional random variable. For more on the conditions under which the inter¬ 
section property holds in the finite-dimensional case, see jTOl. 


Lemma 6.5. Under the assumptions ofTheorem \2.5\ the Gaussian process satisfies the decorrela¬ 

tion property (DCP). 


Proof. We will appeal to Lemma |5.8| Let D C N, A/j = xd, and m G N be arbitrary. We need to show 
that there is a function gntflAoi^) gntfiAoi^) — ^ which satisfies the property that for any E G 

^D.-n-N := o{XD,Xn,Xn+\,...,XN), wc have 

Var(P(£'|Ai,...,A„„Az))|Az) = X£,) < gnifiAoi^) Xd. 


By Theorem 6.2 after conditioning on Xq = xq we know that (Aj,..., A, 
multivariate normal distribution. Let E G 


m,Xn, ...,Xf,/) has some conditional 
^ (with n> m). Then we have that 


Var(P(£|Ai,...,A„„Az))|Az)=xz)) 

= E[(P(£|Ai,...,A™,Az)) 

- E[P(£|Ai,...,A„„Az))|Az) =XD]f |Az) =XD 
< E[(E(£'|Ai =xi,...,A„, =Xm,Az)) 

-P(E|Ai =0,...,A^ = 0 ,Ad))2|Az,=ad 


= E 

(6.1) 

Consider the quantity 


(i) =t{xi,...,Xm),XD=XD)d'^ \Xd = Xd 


dt 


- vP (£'| ^(-^1; — ^ d )‘ 
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If we let fxij denote the corresponding multivariate normal density, we have 


— |f=vP(£'|(Xl,...,Xm) = t{xi,...,Xm),XD=XD) 

“u|f=v / fx[)(Xni ■■■iXX!^tX\^ ...,tXm)dXfi...dxi\l 

dt ■ Je 

I “77 |l=v/xo {^ni • • j tXm)dXfi-- ■dxx! 

JE dt 

J^fxoiXn,-,XN,yXu...,yX,„)^^\t=y {vxo+t ^njXj, Cn jX j 


(6.2) 


( j m m \ \ \ 

Vxn +t ( L j j j dXn...dXN, 


where is a N — n + I dimensional vector depending on XD,n, and N, which represents the total contri¬ 
bution of Xd to the mean of {X„,...,Xi^) given X^ = xd, and c,j' is the factor by which Xj contributes to the 
mean of Xi. More explicitly, we have 


Vxd — 

with L,.DXj)j) defined as a whole by the same definition as in the proof of Theorem 

(6-3) (vxo)r= £ 


6.2 


i.e. 


Xd, 


di^D 


d2eB 


and 


(6.4) 


= O \^go{i,k)gi{k,j)j = 0{g2{iJ)). 


Note that the double sum from line ( |6.3| ) is convergent since = 0{go{r,di)), and 

<\(yd,d2\+Y.\^dik(ykd2\ =0{gi{dud2)) 


and so 


iXxo)r = 


o{ £ Y, Soir,di)gi{di,d2)xd2 

\d 1 eDd 2 eD 


(6.5) 

(6.6) 


= Oi Yj S2{r,d2)xd2 

\d2€B 

— OxD,r{i') a.s. iXe 


where here and elsewhere the subscript on the O means that the implicit constant may depend on the sub¬ 
scripted variables, and we have obtained line ( |6.6| ) by noting that Xd 2 = 61((f|) a.s. Xd from Proposition 
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The presence of the density in the expression from line (6.2 1 gives 


= E 


d 

/ 


( 


- 1 ,. 


xd 


+ t £c„jxy,...,£cw,. 


J^j 


^7=1 


7=1 




y~' CAf, 


7-^7 


(6.7) 


\7=1 7=1 

{Xi,...,Xm) =yixi,...,Xm),XD=XD ■ 


Letting (£ 


-I 


)ij = a'j, the quantity in the derivative above is equal to 


N N 




7-^7 


' k=nt=n 
N N 


7=1 

m 


7=1 


2 /V fv j j m m 


' k—ni—n 


7=1 


7=1 


where C represents a constant term not depending on t, and therefore we have that the derivative with respect 
to t at t = y is given by 

“ 6 L L £ CQXj + v^^^i £ CkjXj + 2y £ CkjXj £ cijxj 


' k—ni—n 


7=1 


7=1 


V7=l 


V7=l 


Thus, incorporating the results of lines ( |6.4[ ) and ( |6.5| ), the derivative from line ( |6.7| ) is bounded in absolute 
value by 


O ( Y.'LsiikJ) ( X 82 {k,d 2 )\xd 2 \ ( £ \g2{ij)^ 

\k=n(.=n 


\d 2 eD 


V7=l 


+ £ \g 2 {k,j)xj\ £ \g 2 ii,h)xh\ 


w=i 

00 00 m 


V/!=I 


\ 


i—nd£D j—l 
00 00 m m 


( 6 . 8 ) 


+ IIIL g2(7,%i(^,%2(A/i)(max 


k—ni—n j—lh—\ 


j<n 


Now, consider the quantity from line ( |6.8[ ) with n = 1 and N = 00 . Applying the recursive definition of 
gn, it satisfies 


(6.9) 


= <9 £§4(<7,;)krf|max|x;| + g 4 {j,h)max 


^deD j=l 


i<n 


j=\h=l 


j<rt 


VI ’ 


where both double sums are convergent by our assumptions. Thus, as n —)• 00 in line ( |6.8[ ), the sums decrease 
to zero (since as n —)■ 00 all terms are removed from the sum), and so we have demonstrated that for any fixed 
m, the derivative in question is bounded by a quantity of the form 

o„ (1) (max I + (max|xj|)^) 

j<m 












18 


DAVID MONTAGUE AND BALA RAJARATNAM 


(where means as n —)■ oo). Combining this with line ( |6.1[ ) gives 

\w{¥{E\X,,...,X^,Xd)\Xd=xd) 





VI 

—\t=y'PiE\{Xl,...,Xm) =t{xi,...,Xm),XD = XD)dy j 

-1 

II 


= o(l)£'(max|;i:y|) + {mdLx\x j\f\XD =xd], 

j<m 


demonstrating the existence of the desired function gm,D,XD (^)- D 


We are now in a position to immediately prove Theorem |2.5[ 

Proof of Theorem [23] Combining Theorem |5.7| with Lemmas |6.3| and 6.5 allows us to deduce (P5*), con¬ 
cluding the proof of Theorem |2.5| □ 

Remark 6.6. The proofs of Theorem 6.2 and Lemma [O] only require assumption (2) on g„ for n <2, but 
the proof of Lemma [63] however requires this assumption up to n = 4. 


6.2. Applications and Examples of Infinite-Dimensional Gaussian Graphical Models. We now illus¬ 
trate the broad applicability of the theory developed above by means of three examples. To do so, we verify 
(P5*) by checking the conditions of Theorem |2.5| These entail (1) a uniform eigenvalue bound on the 
covariance matrices of the marginals, and (2) a covariance decay condition. For a given class of models, 
the decay condition is relatively straightforward to establish as compared to the eigenvalue condition. As a 
result, we will emphasize the verification of the eigenvalue condition in the examples below. 


Example 6.7. (Infinite-dimensional autoregressive model.) 

Let X„ by a collection of random variables defined by 

N 

Pn j^n—j T j 
;=i 

where Ao,X_i, ...,A_a?+i = 0, and fhe {£n)neN i-i-d- ^(0,1). Assume fhaf fhere exisfs 5 > 0 such fhaf 
for all n we have T!j=i \Pnj\ <1 — 5. 

The condifions of Theorem |2. 5 1 can be verified for fhese X„, and so fhe X„ safisfy properfies (IIP), (DCP), 
and (P5*). If can also be shown fhaf fhey safisfy (P*) wifh respecf fo fhe graph = {V,E) on N for which 
{ij) GE iff \i-j\ < N. Theorem |2.3| fhus implies fhaf fhe X,- safisfy (G*) wifh respecf fo fhis graph as well. 
The technical defails are confained in fhe Supplemenfal secfion. 


Example 6.8. (Diagonally dominant Gaussian processes.) 

Let (X„)„gi^ be a Gaussian process and let a,y = Cov{Xi,Xj). Suppose the Oij are uniformly diagonally 
dominant, in the sense that there exists some e > 0 such that 

(6.10) I a,71 > e -h ^ I Oij \ V/. 

Mi 

Suppose also that there is a constant C such that 


( 6 . 11 ) 


I a;/ 1 < C V/, 


and that the function go{iJ) := |cJ,y| and the corresponding gi,g2,g3^ g 4 satisfy the bounds of Theorem 
|2.5[ i.e. the easily verifiable covariance decay condition alluded to above. 

Combining the uniform diagonal dominance with the Gershgorin circle theorem, we obtain the lower 
bound of e on all eigenvalues of £„ for any n. In addition, conditions ( |6.10| ) and ( |6.11| ) combine to show that 
2C is an upper bound on all row sums of £„ for any n (where £„ is the covariance matrix of (Xi, ...,X„)), and 
so is an upper bound on all eigenvalues of L„ for any n. Thus, the conditions of Theorem |6.2| are satisfied, 
and so the (X„)„gis} satisfy (P5*). 
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Conditions ( |6.10| ), ( |6.11| ) and the condition on the g, are satisfied, for example, if each n G N has integer 
coordinates c{n) G Z'”, and the covariance between X; and Xj is determined by applying the powered expo¬ 
nential covariance function to d{c{i),c{j)), the Euclidean distance between c{i) and c{j), for certain values 
of the covariance function parameters. More explicitly, this is the case if the Oij satisfy 

Gij = exp(-r/(c(/),c( 7 ))“/E), 

where 0 < a < 2 and V is positive and sufficiently small (depending on a). For more information on this 
family of covariance functions, known as the powered exponential family, see Il241 . 

For the technical details of this example, see the Supplemental section. 

Example 6.9. (Gaussian lattice model.) 

Fet {Xp)p^ir be a Gaussian process indexed by 7/ (i.e., on a lattice) satisfying 

^PlP2 Cov(Xpj,Xp2) =exp(-r/(pi,p2)^/E), 

where the function d represents Euclidean distance, and F > 0 is arbitrary (so that the covariances are 
specified by a Gaussian covariance funcfion). Nofe fhaf unlike fhe previous example, fhe above covariances 
are in general nol diagonally dominant. 

The Gaussian process with the above covariance function can be shown to satisfy (IIP), (DCP), and (P5*) 
by verifying the conditions of Theorem |2.5| For the technical details, see Appendix [A| 

Remark 6.10. Example |6.9| is intended to demonstrate that many processes of interest satisfy (IIP), (DCP), 
and (P5*). For such processes, if (P*) is satisfied, fhen (G*) will be safisfied, allowing fhe notion of an 
infinile-dimensional graphical model. However, we wish fo clarify fhaf our infenfion in fhis example is nol 
necessarily fo obfain a process for which (P*) is satisfied. 

7. Graphical Discrete Processes 

7.1. Graphical Models for Infinitely Many Discrete Random Variables. 

Definition 7.1. A collection of random variables {X,- | / G /} is called a discrete process provided that each 
X takes values in {0,1}. 

Remark 7.2. While we restrict our attention in this paper to discrete processes taking on only two values, 
standard arguments allow the following results to be extended to the setting where the random variables may 
take one of r distinct values for any finite r > 2. 

We now prove Theorem |2.8[ which is the central result of this section. 

Proof of Theorem We first use property (1) to obtain a positive density, which is used to prove (IIP). 

Fet A,B,C,D be disjoint subsets of N, and let A,B,C be finite. If D is finite, assumption (I) im¬ 
plies that {Xa,Xb,Xc,X£)) has an everywhere-positive density with respect to the counting measure on 
{0, ljl'^l+l^l+l‘^1+1^1, and so (P5) will hold. Thus, we may assume D is infinite. Fet m=\A\ + |B| + |C|. 

We have by assumption that 

cj < P((Xa,Xb,Xc) = (ai,...,a„j) | X^j = b\,...,Xd^ = b,.) 
for all r and any {0,1} valued sequence {bn)neN- By the Fevy zero-one law, 

limP((XA,XB,Xc) = {ai,...,am) \ Xd,,...,X<;J 

r—>oo 

= F{{XA,XB,Xc) = {ai,...,a,„)\Xo) 

with probability 1 with respect to the marginal distribution, and therefore P-almost surely we have 

^{{Xa,Xb,Xc) = («I, I Xo) > Ci- 

Note that since we only need the above line to hold with respect to the marginal distribution, it is enough 
that it hold for P-a.e. value of Xd- 


From here, the argument verifying (IIP) is similar to the proof of Lemma 6.3 
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Lemma 5.8 shows that (DCP) holds under the second assumption, and Theorem 5.7 verifies (P5*), fin¬ 
ishing the argument. □ 


7.2. Applications and Examples of Infinite-Dimensional Discrete Graphical Models. 

Example 7.3. (Two-state Markov Chain.) 

Let {Xn)neN be a {0, l}-valued Markov chain, with transition probabilities 

Pn = P(X„+i = 1|X„ = 1) and tn = P(A„+i = 1|A„ = 0). 

Suppose that 0 < P(Xi = 1) < 1, and that 0 < < 1 for all n G N. Suppose also that 

oo 

n—\ 

Then the {X„ | n G N} satisfy properties (IIP) and (DCP), and hence (P5*). 

Proof. The proof is an application of Theorem |2. 8 [ and is provided in the Supplemental section. □ 


In the previous example, the collection of random variables was assumed to have graphical structure (as 
it forms a Markov chain). We now provide an example which is more general in the sense that it makes 
no assumption of a graphical relationship between the variables in the discrete process, and which will be 
useful in Example |7 .7 1 below. 

Example 7.4. (Sparse Countable Sequences.) 

Let {Xn)nen be a {0, l}-valued discrete stochastic process, and suppose the following: 

(1) for all m G N, there exists Em > 0 such that for any {0,l}-valued sequence (/i,/ 2 ,and any 
finite B C N disjoint from {1, ...,m}, 

P((Ai,...,Xm) = {ii,...pni)\XB = xb) > e,n, and 

(2) for t(co) := #{Xi(co) = 1}, we have T <oo almost surely. 

Then the {X„ | n G N} satisfy properties (IIP) and (DCP). 

Proof. See Appendix [A| □ 

Remark 7.5. By exchanging A,- with E' = 1 —for any {0, l}-valued sequence Zi,Z 2 ,..., the above result 
can be generalized to the case where #{A,- / Z,} < oo almost surely. 


7.3. The Ising Model. 

The Ising model is a well-known and well-studied family of models of finitely many discrete random vari¬ 
ables with graphical structure ifT^ l20l . Even infinite-dimensional versions of the Ising model have been 
well-studied [8 ], but much of this study has focused around lattice-based models and physics applications. 

Our goal in this subsection is to obtain a graph-based infinite-dimensional generalization of the Ising 
model distribution with the ultimate goal of verifying (IIP) and (DCP), and also (P*) with respect to some 
nontrivial graph. In order to obtain this generalization in a rigorous manner, we must first establish a number 
of results about limits of the finite Ising model distributions. Existence and uniqueness results for related 
distributions have been established by other authors (e.g., as in 161), but for the most part these results 
have relied on a lattice-based framework rather than one based on general graphs, as is our goal. Hence 
we provide a self-contained development of an infinite-dimensional generalization of the Ising model in our 
graph-based framework. Our primary purpose in this section is to demonstrate examples of infinite graphical 
models by appealing to Theorem |2.8[ The proofs of the necessary results establishing the existence of the 


infinite Ising model and other properties thereof are provided in the Supplemental section. 
We begin by rigorously introducing the finite-dimensional Ising model. 
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Definition 7.6. The Ising model consists of a discrete process Xi,along with an additional variable 
Xo = 1. The set of parameters is given hy 0 = where E is the edge set of a graph ^ = {V^E) (with 

V = {0,1, such that every nonzero node has an edge to zero. TheX, are then distributed according to 

the unnormalized density 

(7.1) U(X = (xi,...,x„)) :=exp I ^ 

\{iJ)eE 

Since U{X) is finite for any choice of X, this induces a (normalized) probability distribution P oc f/ on 

{ 0 , 1 }”. 



It is straightforward to verify from line ( |7.1| ) above that for this distribution, 

(7.2) P(X/ = 1 |X_ :=x-i) = -———2--——, 

1 +exp(-e;o QjkXk) 

where X_y represents the vector containing all of the X, except Xj. From line ( |7.2| ) it is clear that the (finite) 
Ising model satisfies fhe Markov properfy (L), and fherefore also (P), wifh respecf fo ifs graph f#. In fhe 
finile case, Corollary |4.2| ensures fhaf fhis model salisfies properly (G) as well. 

Example 7.7. For appropriate 0 if is possible fo direcfly normalize fhe measure obfained from line ( |7.1| ) 
even if fhe number of variables is infinile. In parficular, if Lts{o,i}” ^(■^) < °°> then P can be obfained by 
normalizafion (for fhis fo make sense, we define exp(—oo) = 0). We remark fhaf if fhis sum is finife, fhen af 
mosl counlably many terms can be nonzero, and so P will be a discrefe measure. 

Suppose fhaf fhe graph = (NU {OjjE) is such fhaf every node excepf fhe zero node has finite degree. 
Lei 0 be arbifrary such fhaf Ojk < 0 for all j and k, and so fhaf 6ko < —2 log k. Then fhe X, obfained from 
the measure defined in line ( |7.1| ) salisfy (IIP) and (DCP), and fherefore (P5*) by Theorem 5.7 In addifion, 
if can be shown fhaf fhese X, satisfy (P*) wifh respecf fo fhe induced subgraph of ^ excluding fhe zero node, 
and fhus also satisfy (G*) wifh respecf fo fhis graph by Theorem |2. 3 [ For fhe technical defails of fhis claim, 
which involve an application of fhe resull of Example |7.4[ see Appendix [A| 

In fhe above example, we relied upon fhe convergence of fhe sum of fhe U{x). We now give a more 
general freafmenl of our infinife-dimensional Ising model. 


Definition 7.8. Lei = (FjE) be a graph wifh verlex sel V = {0} UN such fhaf every nonzero node has an 
edge fo zero. Lef 0 = {6ij)pj)eE- L^t be fhe induced subgraph on {0,1,...,«}. 

Define fhe disfribulion P„ on {0,1}” fo be fhe Ising model disfribulion of Xi, ...,X„ wifh graph and fhe 
same parameter sef 0. Given 1 < m < n, define P{” fo be fhe marginal disfribulion of Xi, ...,Xm under P„. If 
for all V G { 0 , 1 }™ fhe limif 

(7.3) limP;{((Xi,...,X^)=v) 

n-^oQ 

exisls, we define fhe dislribufion P™ on {0, Ij™ fo be given by fhis limif. Thai is, 

P'”((Xi, ...,X^) = v) = lim P“((Xi, ...,X„,) = v). 

n-^oo 

Finally, we define fhe infinile Ising model wifh graph and parameter sef 0 fo be fhe disfribulion on 
{ 0 , 1 }" wifh fhe infinite producl a-algebra which has fhe finile-dimensional dislribufions provided by fhe 
P'” and Iheir marginalizalions. 


For ease of nolafion, we will assume fhaf for {i,j) 0 E we have dij = 0, so fhaf we need nol specify fhaf 
sums be faken only over edges {i,j) G E. 

Example 7.9. We now presenl anofher infinile exlension of fhe Ising model. Assume fhaf is a graph wifh 
verfex sel NU { 0 } such fhaf every node has finife degree excepf 0 , and 0 = {Qij)(i^j)^E salisfies 

L 
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Under this assumption, there is a limiting joint distribution of the X, which generalizes the finite-dimensional 
Ising model, and moreover this distribution is such that 


f{X) = £2-X,- 
!=1 


has a density with respect to Lebesgue measure on [0,1]. In addition, (IIP), (DCP), and (P5*) are satisfied 
by these variable. These variables also satisfy (P*) with respect to (f, and so satisfy (G*) as well. Details 
are provided in Appendix [A| 


Remark 7.10. It is also possible to demonstrate the existence of the limiting distribution from Definition 7.8 
for other examples of 0 and f#. However, we have restricted our attention to the choices of 0 and from 
Examples |7.7| and [7!9] for the sake of brevity, and also because our goal is to provide sufficient demonstration 
of the applicability of Theorem |2. 8 1 


Remark 7.11. Many of the above ideas can also be extended to the case of the generalized log-linear model, 
where for a given graph 

logP(X) = £ fA{XA). 

A a clique of ^ 

In particular, similar arguments demonstrate that graphical models satisfying (IIP) and (DCP) can be de¬ 
fined when either La a clique of ^/a (Aa) = —for all but countably many choices of X, or when the sum 
La aciiqueof^ I/aC^Ca)! is Uniformly bounded over all values of X. 

Remark 7.12. We note here that Georgii [8] introduces and studies in great detail an infinite-dimensional 
formulation of the Ising model in which only the conditional distributions for the model are specified, rather 
than a joint distribution or any marginals. Georgii takes this approach with the end goal of explaining 
physical phenomena. For example, he demonstrates that the non-uniqueness of a joint distribution with 
certain specified conditional distributions can be associated with phase transitions. We are, however, more 
interested in working with a specific joint distribution and studying its conditional independences. This is 
why we have provided our own self-contained formulation of an infinite-dimensional generalization of the 
Ising model which actually determines a specific joint distribution with the desired conditional distributions. 


Acknowledgements: We thank Apoorva Khare for reading a draft of the manuscript and for giving useful 
suggestions. We also thank Amir Dembo for providing a useful reference on the Ising model and discussions 
about the background to the paper. 
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Appendix A. Proofs of Main Results 


A. 1. Proofs from Section]^ Proof of Theorem |6.2[ Suppose that A C N is any finite collection of nodes. 
The claim is trivial when B is finite, so let B > ■ • ■} ^ N be an infinite collection of nodes, and without 

loss of generality, assume that b{ <b 2 < .... We will use the notation Bn = {bi,...,bn}- 

By the formulas for the conditional distribution of the multivariate normal distribution, we have 

(A. 1) Xa\Xb„ ~ (/Ia + {Xb„ - Bb „), Xa\b„ ) • 

Thus, we have for any event E G o{Xa,Xb„) that 


(A.2) 


nE)= Jjn{xA 


Xb^,...,Xb,,)dXAdv{xB), 


where fn{xA,XBj is the density of the distribution from line ( |A.1[ ) (where Xb„ is given), and where v is the 
probability measure determining the distribution of Xb = [xti ...). 

Combining the eigenvalue bounds with the Cauchy interlacing theorem for Schur complements (as in 
||25^ . for example), we obtain that Ea\b„ is a sequence of matrices whose eigenvalues are all uniformly 
bounded above by C, and so the entries are as well. Thus, each entry is confined uniformly in n fo a closed 
and bounded (and Iherefore sequenfially compacf) inferval. Thus fhere is a subsequence nj such fhaf Xa\b„. 
converges fo a mafrix Xa\b- (For Ibis argumenf, if will nol be necessary fo verify fhaf ZaIb is independenf of 
fhe choice of fhe rij.) 

We now consider fhe conditional mean. Lef Kn be fhe diagonal mafrix of fhe conditional sfandard devia¬ 
tions, i.e., Kn is diagonal wifh diagonal enfries given by 


{Knh = ^Jy^ir{Xb,\Xb„...,Xb,_„Xb_^„...,XbJ. 

Nofe fhaf combining fhe Cauchy inferlacing fheorem (again for Schur complemenfs) wifh fhe eigenvalue 
bounds yields ^/c < {Kn)ii < \fC. Now, consider fhe mafrix 

Kn = XAB„Kn KnXg^^ Kn- 

As discussed in section 5.1.3 of |[T4ll . fhe diagonal enfries of KnX^^Kn are 1, and fhe nondiagonal {i,j) enfry 
of fhis mafrix is fhe negafive of fhe partial correlation of Xb- and Xbj given fhe remaining Xb^, and so is 
bounded by a consfanf mulfiple of fhe conditional covariance (where fhis mulfiple depends on fhe bounds 
on fhe condifional variances provided by c and C), giving us more specifically fhaf 


^ =0{gi{bi,bj)). 

Nofe fhaf fhe above argumenf shows fhaf all condifional covariances are bounded above by a consfanf mul¬ 
fiple of (where fhis consfanf mulfiple depends on c). 

Nexf, we combine fhe facl fhaf fhe enfries of Kn^ are bounded above by ^/ijc and fhaf (EAB„)i,j < 
go{ai,bj) to obfain = 0{go{ai,bj)). 

Thus, we have fhe bound 


{Kn'Lj^^Kn)ij = o( go{bi,bj) + go{bi,k)go{k,bj) 
V k=i 


(A.3) 


\{I.ABXjKn)i^^ 


^{'XAB„Kn )i^k{KnXg^Kn)kJ 
k=l 

( ” 

o Y.So{ai,k)gi{k,bj) 

Vi:=l 

0{g2{ai,bj)), 


where we have used fhe facf fhaf gn{ij) is increasing in n. 
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Because a countable product of sequentially compact spaces is sequentially compact, there is some sub¬ 


sequence riji^ of the rij such that each entry of ^ab„ 

Jk 

the limiting entries 


K„.^ converges as k 


oo, and we have a bound on 


We wish to make it clear that we only attempt to define the expression (Lab^bb^) considered as a whole, 
with entries determined as limits in the manner described above. In particular, we do not define or any 
of the other terms individually. 

The quantity ^ {Xb„ — Hb,, ) has a multivariate normal distribution, and each coordinate is an independent 
c/K(0,1) random variable with variance at most C/c. Since is diagonal, we can define {Xb — Hb) as 
fhe Gaussian process which, for any n, is equal to K^^{Xb„ — HbJ in the first n coordinates, and this is well 
defined by the Kolmogorov consistency theorem. 

We have 


(A.4) 


= AtA + Sab,,.^ Kn^K-l (Xb„ . - ) 

IXa+^AB^BB^^ —/Ifi) 


where the expression in the final line is a vector of convergent sums with probability one. To see that these 
sums are convergent, note that {Lab ^rrK )^^ = 0 {g 2 {k,i)) with probability one, the entries of are 
bounded above and so by Proposition A.5 {Xb — IIb)) i = 0{£^) with probability one, and finally, by 
assumption < oo. 

Thus u.A\B has a limit as k —)• oo with probability one. 

"ik 

Now, if for any fixed choice of xb the function fnj^{xA,XB) = /nj^(xA,XB„ ) is the multivariate normal 
density with mean IJ.a\b„. covariance matrix Xj^\b„ (so that /{xaiXb) does not depend on the values of 

Jk 

xb\b„ )> we have for all E e o{Xa,Xb„. ) that 

Jk ^k 


P(£') = J^f„.^{xA,XB)dXA dv{xB). 

Because of the convergence of the conditional mean and covariance matrix, we have pointwise that 


f„j^{xA,XB) ^ f{xA,XB), 

where f{xA,XB) is the multivariate normal density with the corresponding limiting mean and covariance 
matrix. 

Next, by the uniform eigenvalue bounds on Xxb„. , we have that /„ is bounded above uniformly in 

^Jk 

Thus, we may apply the dominated convergence theorem and obtain that for all E G o{Xa,Xb„. ), we have 

Jk 


(A.5) 


P(£') = l^f{xA,XB)dXA dv{xB). 


Since this holds for arbitrary this equality holds for any E G Uk(y{XA,XB„. ). The set of events 

Jk 

for which this equality holds forms a Dynkin system, and since rij^ —)■ oo, the equality holds for all E G 
U„a(XA,AfiJ. Since this is an algebra generating the a-algebra o{Xa,Xb), we may conclude by Dynkin’s 
71-A theorem that the equality in line ( |A.5| ) holds for all E G g{Xa,Xb). 

Thus, we have shown 

Xa\Xb ~ (Ma + (/b(Ab))a,Sa|b)- 

□ 


Details of Example |6.9[ We verify the conditions of Theorem 6.2 in the case where the dimension r = 2, 
and note that the proof for higher r is similar. For the sake of consistency with the technical notation used 
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0 0 0 0 


0 0 0 0 


0 0 -©— 0 0 


0 - 0 - 0 - 0 - 0 - 


_0 - 0 


-0 - 0 - 


FlGURE 1. An arrangement of nodes appropriate for the argument in Example |6.9| 


elsewhere, we reindex the variables {X^ : z G by N, so that we have a bijection / : N —)> 7?, and we will 
from now on denote ^/-(n) by Note that f{n) still refers to the “location” of the variable for the sake 
of determining covariances. 

The uniform upper bound on the eigenvalues of L,„ for each m G N is easily verified by bounding the row 
sums, which is possible due to the Gaussian rate of decay in the covariances, and can be done uniformly 
over all rows due to the stationarity of the covariance function. The uniform Gaussian rate of decay also 
allows one to verify the required conditions on gr for r G {0,1,2,3,4}. The arguments are similar to the 
derivation of line ( |B.26| ) in Example |6.7| and we omit them here. However, we must still verify the uniform 
positive lower bound on the eigenvalues of the matrices Em- 

Erom the Cauchy interlacing theorem for principal submatrices (as in Il25]l '). it is enough to verify these 
lower bounds for a subsequence of the where we may reorder the to suit our purposes. Moreover, 
for the sake of verifying the lower bound on the eigenvalues of the E,„, we may assume without loss of 
generality (again by the Cauchy interlacing theorem) that the coordinate function / : N —)• is surjective. 
Eet us order the X„ so that the random variables in the set {Ai,...,Xpm+i)2} correspond to the (2m + 1)^ 
nodes in a square centered at (0,0) with side length 2m + 1, as in Eigure[^ 

We now follow arguments similar to those in fO] in order to obtain lower bounds on the eigenvalues of the 
matrices Epm+i)2 in terms of a lower bound of a certain Eourier series. Gray works specifically with Toeplitz 
matrices, but the arguments generalize to this situation because the matrices E( 2 ,„+i )2 are block Toeplitz with 
Toeplitz blocks (higher-dimensional analogues also exist). 

Eet / be the function on [0,27r]^ defined by 

fix,y)= f 

j^k— — oo \ / 

This definition implies the relationship 

exp(-T^) = 


We have for a length (2m +1)^ vector x that 
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XL,„X = 2^ exp[- - - IX(j^k)X(r,s)- 


j,k,r,s——m 

m j n27i pin 


471^ . 


f{x,y)e 


flTi fin 

m— 1 

L L 


Jo Jo 

J,r=0 


Similarly, 


X x = 


4k^ . 


/•In pin 

m—\ 

Jo Jo 


% 

II 

o 


dxdy. 


dxdy. 


Let ruf = essinf/(x,y) over [0,2?!]^, and similarly Mf = esssup/(x,y). Then combining the above, we 
have that if A is any eigenvalue of 

. X*ZnX X*I.mX 

nif < mm- < A < max-< Mf. 

■’ X X*X X X*X 


Recall 


f{x,y) = ®^p(“ 


j,k——o 


•2 / 2 ' 

J \ i{jx+ky) 

V 


V7=-“ 


L “ p ( 4 )‘'")( E “ p (4 


and so if we define 


\k=- 


g{x,V)= L exp(-//l/)4^ 
7=-“ 

then provided that rfimxg{x,V) > 0 we have that 

mf = (mmg(x,L)^ 


V 


iky 


since g(x,L) is a continuous function of x for any given L > 0. 

Thus, in order to solve our problem, it is enough to show that for any given T > 0, the value r[imxg{x,V) 
is positive. This function g is a special case of the well-understood Jacobi theta function, which is defined 
by 

oo 

0(2|t)= £ 


In particular, we have that 


g(x,T) = 0 




Thus a specific choice of (real) V corresponds to T = ^, and any choice of real x corresponds to z = 
which is also real. 

The zeros of the Jacobi theta function are known: 0(z| t) = 0 if and only ifz = l/2 + T/2 + n + mr for 
some n,m G Z ||2T1. In particular, for T purely imaginary with positive imaginary part, 0(z|t) is nonzero for 
all real z. Thus, we may conclude that 


g(x, T) / 0 for all X G [0,27r], T > 0. 
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But for any fixed x, the function g{x,V) is real valued and continuous as a function of real V > 0, and 
for all A we have limv^og{x,V) = 1. Since for any fixed real x we know g{x,V) ^ 0 for any V > 0, by the 
intermediate value theorem we may conclude 


g{x,'V) > 0 for all A G [0, In ], V > 0. 

Since g(x, V) is a continuous function in x for any fixed V, fhe image of fhe compacf inferval [0, 2n\ under 
g{-,V) is also a compacf inferval, call if [a,b]. By fhe above argumenf, fhe lower endpoinf of fhis inferval 
musf be posifive, so in particular g{x,V) > a > 0 for all x G [0,27r]. 

Thus we have shown fhaf rmnxg{x,V) > a > 0 is posifive, concluding fhe proof fhaf fhe eigenvalues of 
^( 2 m+i )2 uniformly posifive, and fhus allowing us fo conclude from Theorem 6.2 fhaf (IIP) and (DCP), 
and hence (P5*), are salisfied for fhis process. □ 


Remark A.l. The defails of fhe above example were carried ouf over Z'" for m = 2, buf fhe same argumenf 
works for arbifrary dimension m by considering an m-dimensional Fourier series, and again reducing fhe 
problem fo a uniform posifive lower bound of g{x,V). 

Remark A.2. The Jacobi fhefa funcfion argumenf demonsfrafing fhe posifivify of mf above leverages fhe 
form of fhe Gaussian covariance funcfion. For ofher covariance funcfions, if is offen sfraighfforward fo 
verify numerically a lower bound on mf, and Iherefore on fhe eigenvalues of fhe Zm- 


A.2. Proofs from Section]^ Details of Example |7.4| We appeal fo Theorem 2.8 Poinf (l)offhesfafemenf 
of fhe example implies poinf (1) of Theorem |2. 8 [ so we jusf need fo verify poinf (2). 

Suppose fhaf A G ...), and fhaf m is fixed. Recall fhaf 

P((Xi, ...,A,„) = {ii,...,im)\XB) > Em 


for any finite B, and so by fhe Levy zero-one law, we have P((Xi, ...,X,n) = {ii,...,im)\XB) > £m a.s. Xb for 
any I = (I'l, and any (pofenlially infinite) B C {m+ l,m + 2 ,...}. 

Lef B Q {m + l,m + 2, ...} be arbifrary. Note fhaf since E[t] < oo^ we have fhaf #{b G B\Xh{(o) = 1} < oo 
almosf surely, so we may resfricf our affenlion fo Xb which are nonzero in only finifely many places. 

Define fhe evenfs 

C„ := {Xi = 0 for all i >n,i^ B}, and 
Dn :=C^ = {Xi = 1 for some i > n,i ^ B}. 

Note fhaf by fhe definifion of a(A„,X„+i,...), eifher C„ n A = C„ or C„ n A = 0. We freaf fhese fwo cases 
separately. 

Firsf, suppose C„ n A = C„. Then 


P(C„|Xi,...,X™)<P(A|Xi,...,X™)<l, 

and so 


Var(P(A|(Xi,...,A„,)) = {h,...,i,n),XB = xb) 

< 1 -P(C„|(Ai,...,X,„) = {h,...,i,n),XB=XB) 

= F{Dn\{Xi,...,Xm) = {iu...,im),XB =Xb) 

_ F{Dn,{Xi,...,Xm) = {ii,...,im)\XB = Xb) 

F{{Xi,...,Xm) = {i\,...,im)\XB = Xb) 

^ F{Dn\XB =Xb) 

~ = {h,---,im)\XB =Xb) 

< —F{D„\Xb = Xb) =: gm,B,XB{n)- 
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On the other hand, suppose that C„ n A = 0. Then A C D„, so 

0 < P(A|(Xi,...,Xm) = (/i,..<P(D„|(Xi,...,X„,) = (/i,. 

and so 

Var(P(A|(Xi,...,X,„)) = = xb) 

< P(Dfi\ (Xj, ...,Xhj) — (I'l, ijfi\XB — Ag) 

< —¥{D„\Xb=Xb) =gm,B,XB{n)- 

Thus, to verify (2) of Theorem |2.8[ it is enough to check that /’(D„|Xb = xb) tends to zero as n tends to 
infinity. As noted above, we may assume that {f? G B | = 1} is finite, and let r = maxjf? G B | a;, = 1}. 

Let B^ = Bn {1, Then for all n> r, 

^{Dn\XB =Xb) < P(D„|Ab^ =^Br) < —'^{Dn)- 

Er 

So it is enough to show that lim„^<x,P(D„) = 0. 

By assumption, T := #{Xi(ft)) = 1} < oo almost surely, and so max{/ | Xi{(o) = 1} < oo almost surely. 
Therefore, 

^{Dn) < P(max{/ I Xi{(o) = 1} > n) 

—)• 0 as n —7- oo, 

concluding the proof. □ 


Details of Example |7.7[ For 0 satisfying these conditions, %,o = for any choice of and so the 
only X for which U {x) could possibly be nonzero are those with only finitely many x,- = 1. 

We have that for some constant C, 


CO oo 1 ^ 

£ exp(e,to) =: ^ 

I I K ft 

k—n k—n 


Because of this, we have that 


£f/(x) < £ 

a: n=\ 


£ exp 

k{<...<kn 



oo n oo 

<£nc^^ £;;^<exp(C). 

n=\k=l n=l "■ 


Thus we may normalize 17 to a probability distribution P on {0,1}°°, and we note that under P, the 
quantity #{A,- = 1} is almost surely finite. 

Next, for any finite B disjoint from {1, ...,m}, we have 


^ niinP((Xi, = (/'i,., im)\Xfn-\-\ — iXm-\-2 — -^m+2? • ••) 

a: 


In the following lines, we will use the notation that for n > m we have /„ = x„ and F„ = x„. Then for a 
specific choice of x, we have 


P((Ai, ...,2fm) — (I'l,. .., I'ni) |2fm+I — Xin-\-l ,X^-^-2 —Xm+2)'") 

_ ^'^P{H{j,k)eE,j oi kell,m] ^jkijik) 

&6{0,I}'" ^^P{'L{j,k)eEJ or ke[l,m] ^JkYjYk) ’ 

and since the graph under consideration has finite degree at each nonzero node, and dj^t > —°° for each 
{j,k) G E, the denominator in the above expression is bounded above uniformly in the choice of x, and so 
we have 

P((2fi, ...,2f„j) — (I'l,..|2fg — Xb) > E/n 

uniformly in B for some positive Em- 





30 


DAVID MONTAGUE AND BALA RAJARATNAM 


Thus, by the result of Example |7 .41 this collection of X, satisfies (IIP) and (DCP). 

Note moreover that it is straightforward to directly verify that line ( |7.2[ ) holds for any (even infinite) col¬ 
lection of {Xn)neN satisfying the above hypotheses, and so the X„ satisfy (P*) with respect to the induced 
subgraph of obtained by removing the zero node. □ 

The following result is used in verifying the details of Example |7.9| 

Proposition A.3. Given a graph = ({OjUNjE) and parameter set 0, the infinite Ising model (as in 
Definition \7.8\ with this graph and parameter set is well-defined and satisfies P((Xi ,...,Xm) = v) > 0/or all 
m,v if and only if 

\\mfn{v,n) 

n^oo 

exists, and is finite and nonzero for all v G {0,1}"* and m G N, where 

+ T,j>m ^OjXj) 

Proof The proof is included in the Supplemental section. □ 


Details of Example |7.9[ We first appeal to Proposition |A.3| to obtain the existence of the joint limiting 
distribution. 

By definition, 

fn(v,n) — 


LzefO,!}"-™ ~PT!j>m ^Oj^j) 

and so for sufficiently large n (such that n + 1 is not adjacent to any nonzero node with index less than or 
equal to m) 


f„{v,n+l) = 


X6{0,1} 


L exp I Y. dijXiXj + Y ) ( l+expf 

0 , 1 }"“'" \ij>m i<mj>m J \ Vy—^ 


Xe{0,l}" 


Y ( , L + L j [ 1 + exp [ £ e(„+i)yX 


V i,y>m 


j>m 


J<n 


Define 


_/m(v,n+l) „ ._ V-' Ifl I 

, X ) fn ■— / . I Pw; I • 
fm{v,n) 


Nofe fhaf <°° by hypofhesis. 

Thus, based on fhe above compufafions we have 


I a,, — 11 < max 


1 - 


^ Texp 


l+exp(i:y<„ 0(„+l);P;) 


< max 


1 - 


l+exp(j8„ 


l+exp(-j8„ 
exp(j8„) -exp(-j8„) 


1 - 


l+exp(-j8„ 


l+exp(j3„) 


l+exp(-j8„) 

Noting fhaf j8„ —)■ 0, we have from fhe previous line fhaf for all sufficienlly large n, fhe quanfify |a„ — 1| < 
2j8„. Since ^j8„ < o°, fhe above bound implies fhaf fhe producf 0,7=1 exisfs and is nonzero and finile. 
Since /^(v, 1) >0 and 


N 

fn,{v,N+l) = fm{v,l)Y[(^n, 
n—\ 
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we may therefore conclude that the limit lim„^oo/m(v,n) exists and is nonzero for all v and m. Thus, by 
Proposition we obtain the existence of a limiting distribution which satisfies 

= (it) •••) in)| i}^bi i = {jli-'-i jm)) 

> min = {i\,...,in)\Kc[A) = j) 

76{0,I}*'“W 

=: £„ > 0 


uniformly over all finite choices of B and j. 

In addition, if B is arbitrary and A G then 

Var(P(A|Xi,...,X™,XB)|XB=XB) 

< maxP(A|Xi,...,X^,XB)-minP(A|Xi,...,X^,XB) 

< maxP(A|X„e(^)) - minP(A|Xne(A)) 

< I y,t>min({n}Une{n,...,V}) I ) 

y,i:>min({n}Une{«,...,V}) I ) 

and this bound goes to zero as n goes to infinity since each node in is adjacent to at most finitely many 
others (and so minne(A) —)• oo), and since Y,{j,k)&E \ ^jk\ < °°- 

Thus, both requirements of Theorem |2.8| are satisfied, and so fhis example safisfies (IIP) and (DCP), and 
also (P5*) by Theorem |5.7[ 

We now show fhaf (P*) is satisfied. The same argumenf used in Example |7.7| allows us fo compufe fhe 
conditional disfribufions for each ¥„ and show fhaf fhe X„ satisfy (L*) wifh respecf fo fhe graph n 

and fhe probabilify disfribufion for all sufficienfly large n. Thaf is, for any finife B C N disjoinf 
from A and fhe neighbor sef of A, we have A 11 B | ne(A) wifh respecf fo P„ for all sufficienfly large n, and 
therefore for the limiting P as well. Noting that the conditional independence of two infinite-dimensional 
processes is equivalent to the conditional independence of all of their finite-dimensional distributions, we 
obtain that (L*) is satisfied for fhe infinife-dimensional limifing disfribufion P, and fherefore fhaf (P*) is as 
well. 

Finally, we demonsfrafe fhe exisfence of fhe densify for fhe quanfify 


For any m <n, we have 

P™((Xi,...,X,„) = (xi,...,x„,)) = 


f{X) = l^2-%. 

;=I 

\Xk —Xji ,k<m 


Ixelo.ll" exp(I"/,/)££ SijXiXj) 


and we can provide bounds on fhis quanfify of fhe form 


P>„((Xi,...,X,„) = {xi,...,Xm)) < 


2"-"exp(i:|fty|) _ exp(2i:|fty| 
2«exp(-i:|e,y|) 2”^ 


and also 


P“((Xi,...,X„,) = (ai,...,x,„))> 


2"-"exp(-I|e,-,|) exp(-2i:|e,y|) 

2"exp(i:|e,y|) 2"* 


Thus, we have shown fhaf for any n> m, we have a consfanf C = exp(2^ |0,y|) such fhaf 

1 ((^I) •■•jXn) = (ai, ...,Xm)) < 

and so fhis holds for fhe finife-dimensional disfribufion of fhe limifing disfribufion, P”’, as well. This implies 
fhaf fhe limifing disfribufion P'” is such fhat 

m 

MX) := £2-% 

i=\ 
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has a density with respect to Lebesgue measure for each m, and moreover that the limiting function / and 
distribution P (obtained by letting m —)> oo) do as well, by taking the pointwise limit and appealing to the 
dominated convergence theorem. □ 


A.3. Auxiliary Lemmas. We conclude the appendix with three auxiliary lemmas that have been useful at 
various points in the paper. 


Lemma A.4. Suppose that ^ C are <j-algebras on a probability space such that for any 

A G there is some Af G ^ such that P(AAAf) = 0. Then for any bounded measurable function f we have 
E[/|^] = E[/|(^] almost surely. 


Proof Since ^ C we have that E[/|^] is ^^-measurable. Now, let S G ^ be arbitrary, and Sf & such 
that P(SASf) = 0. Then 


(A.6) 


E[l5E[/|^]] =E[l5^E[/|^]] =E[l5^/] =E[l5/] 


where in line ( |A.6| ) we have used the definition of conditional expectation to obtain the second equality, and 
for the other two the fact that E(5'A5 'f) = 0, which ensures that these steps cause the expression to change 
by an integral over a set of measure zero, and therefore preserve equality. □ 


Proposition A.5. Let Zi,Z 2 ,...be a collection of standard normal random variables (not necessarily i.i.d.). 
Then for any d >Q, we have 

E(Z„ < n^ for all sufficiently large n) = \. 

Proof We have 

P(Z„ < n^ for all sufficiently large n) = n“^^Z„ <n^) = lim P(n"^„jZ„ < n^) 

m— 

oo 

(A.7) = lim 1 - E(U“^^Z„ > > 1 - lim V E(Z„ > n^). 

m^oo m^oo ^ 

n—m 

Thus, to show the claim it is enough to show that 

oo 

(A.8) £ E(Z„ > n^) < oo, 

n—l 

since then the final limit from line (|A.7|) will be zero. 


(A.9) 


By the standard tail bounds for the normal distribution, we have 

oo . 


£E(Z„ >n^) < £ g exp(-n^V2) < £ exp(-n25/2). 

n=i „=inV27r 


Now, a comparison with 1 jn^ = exp(—21og(n)) shows that the terms from line ( |A.9| ) are eventually bounded 
above by the terms of this convergent series, and so we have verified line (|A.8[), and hence the claim. □ 


Proposition A.6. Let (fl, ^,E) he a probability space, and let C ^ be an algebra generating ^. Then 
for allB ^ ^ and £ > 0, we can find A G 32 / such that 

P(AAB) < £. 


Proof This is part of a hint to exercise 1.12.102 in ||T1. 


□ 
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Appendix B. Supplemental Section 

Proposition B.l. Suppose V is finite, and that {Xy \ v € V} is a collection of random variables. Then the 
following are equivalent: 

• (P5) For X,Y,Z,W any finite collections of the Xy, if X _LL Y \ {W,Z) and X _LL W \ {Y,Z), then 
X_LL (F,W) I Z. 

• (P5*) For X,Z,Ai,...,An any finite collections of the Xy, ifX _LLA,- | {Z,{Aj)j,^i) for all i <n, then 
X lX{Au...,An)\Z. 

Proof First, note that by Lemma [5^ (PI) - (P4) hold for the given collection of random variables. 

(P5*) ^ (P5): 

This follows from the case n = 2. 


(P5) ^ (P5*): 

We shall proceed by induction on n. (P5) gives the base case n = 2. Suppose now that (P5*) holds for any 
collection of subsets A,- FV,i < n. 

Let A,, / < n + 1 be arbitrary finite collections of the Xy. Then by (P5), 

X _LL (A;,A„+i) I (Z, (Ay)^y,-„+i) for all \ <j<n. 

Let Bi = (A,-,A„+i). Then X 11 B,- | (Z, and so by the induction hypothesis, X 11 (Bi, | Z. 

By property (P2), X 11 (Ai, ...,A„+i) | Z. This completes the induction step. □ 

Remark B.2. The above proof demonstrates that the equivalence of (P5*) and (P5) holds more generally at 
the level of ternary relations, as opposed to just the relation induced by conditional independence. 

Example B.3. Let Yq,Y\,Y 2 be i.i.d. {0, l}-valued Bernoulli random variables with p = 1/4, and let F 3 = 
Y\ +F 2 . Let (X „)„>3 be a collection of independent Bernoulli random variables, which are also independent 
of the F/, such that < 00 with probability one. Finally, let 

00 

Xi=Fi + £X 3 i+i mod 2 , 
k=i 

00 

Z 2 = F 2 + 52 ^?>k+i mod 2, 
k=i 

00 

2^3 = E 3 + 52 2^3<:+3 mod 2. 

k=i 

Also, let Xo = Xi + Fq. Then the (Xj.)“^q form a discrete stochastic process. Conditioning on the a-algebra 
generated by |X 4 ,X 3 ,...) gives 

Xi=Fi+ci(X 4 ,X 5 ,...) mod 2, 

X 2 = F 2 + C2(X4,X5,...) 

X3=F3+C3(X4,X5,...) 

for ci,C 2 ,C 3 some 0 , 1 -valued functions of (X,)Jl 4 so that 
X3=Xi+X2+C3(X4,...)-C2(X4, 

Because of the above relation, we have that 
(B.l) Xo^Xi |X2,X3,..., 

and 

(B.2) Xo^X 2 |Xi,X3,..., 

since X\ and X 2 are constant given the remaining variables. However, it is not the case that 

Xo^(Xi,X 2 ) |X3,X4,... 


mod 2, 
mod 2, 

..) — ci(X 4 ,...) mod 2. 
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since the conditional distribution of {X\,X 2 ) is such that X\ takes both values 0 and 1 with nonzero prob¬ 
ability, and Fo is 1 with probability 1/4, so that Xq is equal to Xi with probability 3/4 and different with 
probability 1/4. The lack of this conditional independence despite lines ( |B.l i and ( B.2 1 demonstrates that 
both (IIP) and (P5*) fail for this example. Moreover, in the above example (P*) => (G*) does not hold, since 
if we let be the graph defined by (P*), then the set of nodes {3,4,...} separates {0} and {1,2}, but if (G*) 
held this would imply Xq ±L {Xi,X 2 ) | X 3 ,X 4 ,..., which we just disproved. 

Note also that (P5) holds for every finite subcollection of these variables by Proposition |3 .3 1 since every 
finite collection of n of these variables has an everywhere-positive density with respect to the counting 
measure on {0,1}”. 

Finally, we demonstrate that any collection of {0,1 {-valued random variables X^ for which #{2f„ / 0} < oo 
with probability one (including the collection just described) necessarily satisfies (DCP). Suppose that 
{Xi I / G Nj is such a collection. Suppose also that D C N is arbitrary, and E G n„a(X£),X„,X„+i,...). 
With probability one, the infinite-dimensional vector of random variables X = (Xi,X 2 ,...) takes on one 
of only at most countably infinitely many values x (namely, those with only finitely many ones), and 
so the event E = {x \ x ^ E and P(X = x) > 0} satisfies P(£'A£') = 0. Let n ^ D he, fixed, and use 
the notation that if T = {xi,X 2 , ...,x„,...), then x' = (xi,X 2 , 1 — ...). Define the a-algebra 

[B £ o{XD,Xf,^i,...) \ X £ B ^ x' £ B}. Then for any m £ D or m > n, we have o{Xm) C Thus 
a(X£),X„+i,X„+ 2 ,•••) G So, we have for any n ^ D that x £ E implies ^ £ E. Thus, if we define 
71 (E) := {xd I T G Fj so that 7i is the projection onto the coordinates corresponding to variables in D, then 
E = Xj{y^{n{E)) £ o{Xd), and so we have directly verified that (DCP) holds. 


Example B.4. Let 6 be a {0, Ij-valued bemoulli random variable with probability 0.5. Let (2f,)/gN be a 
collection of i.i.d. =/L(0,1) random variables, and let F,- = Xi + Q. 

Then the F)- satisfy (P*) with respect to the edgeless graph on N, but do not satisfy (G*) with respect to 
this graph. This is evident from the fact that for any i,j, with probability one, lim„^oo jYk = S, 

so that conditioning on any set of all but two of the F^ determines the value of 6 with probability one, and 
F, is independent of Yj given 6. However, the statement that (G*) holds with respect to the edgeless graph 
on N is equivalent to the statement that all of the random variables in question are marginally independent. 
However, it is clear that F, and Yj are not marginally independent since 


Cox{Yi,Yj) = E[YiYj]-E[Yi]E[Yj] 

= 0.5 (E[X,-X^-] +E[(X,- + l)(Xy + 1)]) - (0.5(E[X,-] +E[X; + 1]))^ 
= 0.5-0.25 = 0.25/0. 


Note that (DCP) is not satisfied in this example since the event 6 = 1 is contained in the a-algebra 
generated by any infinite collection of the random variables, but is not contained (even up to a measure 
zero modification) in the a-algebra generated by any finite collection of the variables. Thus, letting D = 0 
in the definition of (DCP) shows that (DCP) fails to hold. On the other hand, it is easy to see that (IIP) 
is satisfied: any finite collection of the random variables has an everywhere-positive joint density, so the 
required implications of (IIP) involving a finite conditioning set will all hold; if the conditioning set is 
infinite, then the arguments from the second paragraph of this example show that all of the non-conditioned 
variables are conditionally independent, so that any implication required by (IIP) which involves an infinite 
conditioning set will also hold. 

More generally, for any graph ^ which contains a node v with finite degree, and another node w not 
adjacent to v, we may let (X,),^^ be a gaussian process independent of 6 which satisfies (G*) with respect to 
If we again let F, = Xi + 6, then the satisfy (P*) with respect to (#, but do not satisfy (G*) with respect 
to as seen by considering the separating set to be the neighbor set of v, and noting that v and w are not 
independent given any finite subset of variables (since then d is not determined with probability 1). 
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Example B.5. Let be a collection of infinitely many random variables such that for all ij G N, it is 

not the case that 

Let 0 be a {0, l}-valued Bernoulli random variable, independent of all other variables mentioned so far, 
with p = 0.5. Finally, let X„ = A„ + 6. We will show that the collection of random variables {Xi ,X 2 ,...} 
satisfies (P5*) but not (DCP). 

Suppose that is a graph for which (P*) is satisfied. Note that the events {6 = 0} and {6 = 1} are 
contained in the sigma algebra generated by any infinite collection of the X, (by the law of large numbers). 
Thus, Co\{Xi,Xj I {Xk I k ^ i,j}) = Cov(A,,Aj | {A^ \ k ^ i,j}) / 0, and so there must be an edge between 
i and j in . Then is the complete graph on N, and so (G*) is trivially satisfied. Therefore (P*) implies 
(G*), and so by Proposition |4.4[ (P5*) holds for this collection of random variables. 

If (DCP) were to hold, then it would be the case that n„a(X„,X„+i,...) = a(0), the trivial a-algebra. 
However, n„a(X„,X„+i,...) contains the nontrivial events {6 = 1} and {6 = 0}, and so (DCP) does not 
hold. 

All that remains is to demonstrate that there exist collections of random variables (A„)"^j such that there 
are no pairwise conditional independences, i.e., such that there is no relation of the form A,- -LL Ay | {A,t | k / 

iJ}- 

Let 0 < a < 1, and define the A,- as follows: let be a collection of i.i.d. o4^(0,1) random variables, 

let Ai = Bi, and for n > I, let A„ = + aB^ Suppose that i < j, and note that 

Cov(A;,Ay I {A^: | kj^iJ}) = limCov(A;,Ay I Ai,...,A,Ay,...,A^:) 

k-A'Oo 

— Cov(A/,Ay I Aj,,A/,.,Aj,Ay-|-i) 


where the final line was obtained by using (Aj, ...,Ay) _LL (Ay +25 •••) I '^y+i- Next, recall that 


Cov(A,',Ay I A\ , ...^Ai, ...,Ay,Ay_|_i ) — 0 


if and only if (Zy+i).^.^ 
given by 


0, where Zy+i is the covariance matrix of (Ai, ...,Ay+i). This covariance matrix is 


{'^j+i)ik 


1 , 


1 + ot^, 

a. 


lo, 


i = k= \ 
i = k^ \ 

\i — k\ = l 
\i — k\ > 1. 


From this formula, it is readily verified that 


(Sy+i)r. 


(-a)'-'I:/;^'(-a)2^ i>k 
(-a)'-T;!;^'(-a)2^ i<k. 


Thus, for i < j, 

(E,>,)y'=(-ar' + (-ay"'+Vo. 

Thus, it is not the case that A,-11 Ay | {Ajt^,k / iJ}, completing the proof. 


Proof of Lemma [5^ (PI*): This is trivial from the commutativity of multiplication and intersection in line 
(P2*): This is trivial from the fact that CJ(T) C o{Y,W) and a(VT) C a(T, VT). 

Before proving (P3*) and (P4*), we now show that if CJ(X) 11 a(W) | CJ(Z), then for any A G <7(X), we 
have P(A|a(Z,W)) = P(A|a(Z)). To see this, letM = {/^nS | R G a(Z),5' G a(VT)}, and note thatM is a 
TT-system which generates a{Z,W). Also, the collection ^ of events T satisfying 


IE[lrIE[lA I Z]] =E[17’1 a] 
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is readily verified to be a Dynkin system (i.e., it is closed under complement and disjoint union). Thus, if we 
can show that M C then by Dynkin’s n-X theorem, we can conclude from the definition of conditional 
expectation that P(A|a(Z,IT)) = P(A|a(Z)) (noting that P(A|a(Z)) is a a(Z,lT)-measurable function). 

If G a(Z) and 5 G a(lT) are arbitrary, then 


IE[lRnsIE[lA I O'(Z)]] 


(B.3) 

= E[E[lRn5E[lA 1 a(Z)] | a(Z)]] 

(B.4) 

= E[E[lA|a(Z)]E[l«l5|a(Z)]] 

(B.5) 

= E[E[1a I a(Z)]E[ls| a(Z)]l«] 

(B.6) 

= E[E[1a1s| a(Z)]l«] 

(B.7) 

= E[E[1a1s1r I a(Z)]] 

(B.8) 

= PIIrhsIa], 


where lines ( |B.3| ) and ( |B.8| ) are justified by the tower property, lines ( |B.4| ), ( |B.5| ), and ( |B.7| ) by removing 
what is known, and line ( |B.6[ ) by the independence assumption. Thus, we have P(A|a(Z,lT)) = P(A|a(Z)). 

(P3*): Suppose that CJ(X) 11 a{Y,W) \ CJ(Z). We now wish to show that for arbitrary A G CJ(A') and 
BG a(F), 

F{A\o{Z,W))F{B\oiZ,W))=F{A,B\o{Z,W)). 


By the definition of conditional expectation, it is enough to show that, for C an arbitrary event in a(Z, IT), 
we have 


E[lcP(A|a(Z,lT))P(B|a(Z,lT))]=E[lclAlB]. 


Well, 


(B.9) 

(B.IO) 

(B.ll) 

(B.12) 

(B.13) 

(B.14) 

(B.15) 


E[lcE[lA|a(Z,lT)]E[lB|a(Z,lT)]] 

= E[E[lA|a(Z)]E[lBlc|a(Z,lT)]] 

= E[E[E[lA|a(Z)]E[lBlc|a(Z,lT)] | a(Z)]] 
= E[E[lA|a(Z)]E[E[lBlc|a(Z,lT)] | a(Z)]] 
= E[E[lA|a(Z)]E[lBlc|a(Z)]] 

= E[E[lAlBlc|a(Z)]] 

= E[1a1b1c]; 


where in line ( B.lO i we use (P2*) and the above argument, and remove what is known. Line ( B.12 ) is 
obtained by removing what is known, lines ( |B.11| ), ( |B.13 1, and ( B.15 1 are obtained by the tower property, 
and line ( |B.14| ) is by the assumed independence. This verifies (P3*). 

(P4*): Suppose that X 11 F | (Z, IT) and X 11 IT | Z. We wish to show X 11 (F, IT) | Z. As in the proof of 
(P3*), we use the fact thatX 11 IT | Z to obtain that for any A G <7(X), we have E(A|a(Z,lT)) = E(A|a(Z)). 

Suppose now that A G (j{X),B G a(F),C G CJ(IT), and D G CJ(Z) are arbitrary. 

Then 


E[E[1a |Z]E[lfinc|Z]lD] 


(B.16) 

= E[E[1a|Z]E[E[1b1c|Z,IT] |Z]lz)] 

(B.17) 

= E[E[E[1a |Z]E[1b1c|Z, 1T] |Z]lz)] 

(B.18) 

= E[E[E[1a I Z,1T]E[1b1c I Z,iF] | Z]Id] 

(B.19) 

= E[E[E[1a I Z,IT]E[1b I Z,IT]1c1d | Z]] 

(B.20) 

= E[E[E[1a1b|Z,1T]1c1d|Z]] 

(B.21) 

= E[E[E[lAlBlclz)|Z,iF] |Z]] 

(B.22) 

= E[1a1b1c1z)]- 
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Thus, we may conclude by a n-X argument (similar to that used to show ¥{A\o{Z,W)) = P(A|a(Z)) 
prior to (P3*)) that o{X) 11 a{Y,W) \ cj(Z). □ 


Proof of Lemma [O] Let A,B,C C N be finite, and let D C N be an arbitrary subset. Suppose that Xa 11 
Xb\Xc,Xd and Xa ALXc \ Xb,Xd- If D is finite, the verification of (IIP) is a consequence of Proposition [3^ 
so assume D C N is infinite. 


By Theorem 6.2 we have that for any E G o{Xa,Xb,Xc,Xd), 


P(£') = / f{xA,XB,Xc,XD)dXAdXBdxcdn{xD) 
Je 


for some probability measure /r, where /{xaAbXcAd) is the multivariate normal density with the mean 
Ba,b,c\d{^d) and covariance matrix L^,b_c|z)- 

Similarly, we have the existence of /i(xb,xc,X£)), fii^AAcAD), and /^{xa^xb-iXd), the corresponding 
densities for events in o{Xb,Xc,Xd), o{Xa,Xc.,Xd), and o{Xa,Xb,Xd) respectively. Note that these are 
densities with respect to Lebesgue measure in the corresponding A, B, and C coordinates, and jj. for xd, and 
moreover since they are normal densities, they are positive with probability one. 

Let Ea G o{Xa) and Eb G o{Xb). By the conditional independence Xa 11 Xb\Xc,Xd, we have 


F{Ea\Xc,Xd)F{Eb\Xc,Xd) = F{Ea,Eb\Xc,Xd). 


Let E = {{xa,Xb,Xc,Xd) : f\{xb,Xc,Xd)f2{xaAc,Xd) / f{xa,Xb,Xc,Xd)]. Suppose that P(£') > 0. Then 
E = £■+ U where 


E+ = {{xa,Xb,Xc,Xd) : fi{xb,Xc,Xd)f 2 {xa,Xc,Xd) > f{xa,Xb,Xc,Xd)}, and 
E = {{Xa,Xb,Xc,Xd) ■ f\{Xb,Xc,Xd)f2{Xa,Xc,Xd) < f{Xa,Xb,Xc,Xd)}. 


At least one of these two sets has positive probability, so without loss of generality we will assume IP(£'+) > 
0 . 

Then 


(B.23) 


/ fl{xb,Xc,Xd)f2{Xa,Xc,Xd) - f{Xa,Xb,Xc,Xd)dXadXbdXcdB{xd) > 0 . 
JE. 


By Proposition A.6[ there is a sequence of sets E„ which are finite unions of finite intersections of sets in 
o{Xa),o{Xb),o{Xc), or a(A/)), such that F{E^AE„) < 1 jn. Thus, since all of the densities under consider¬ 
ation are bounded (this can be seen from the eigenvalue bounds in the statement of Theorem |6.2| ), we have 
from the dominated convergence theorem that 


/l {Xb,Xc,Xd)f2[Xa,Xc,Xd)- f{Xa,Xb,Xc,Xd)dXadXbdXcd}X {xd ) 


tends to the expression on the left side of line ( |B.23| ), and so is positive for sufficiently large n. 

Since E^ is a finite union of finite intersections, there is some G = Ga n H Gc n Go with Ga G cj(Aa), 

etc., which is one of the terms in the union comprising and for which 



Xd)f2{Xa, 


XcAd) -f{Xa,Xb,Xc 


Xd)dxadxbdxcdB{xd) > 0. 







38 


DAVID MONTAGUE AND BALA RAJARATNAM 


The above integral can be written 

0 < / / / f\{xb,Xc,Xd)f2{Xa-,Xc,Xd)- 

JGonGc JGb JGa 

-f{Xa , Xb , Xc,Xd)dXadXbdXcdH (Xd) 


/GdHGc J Gb 


fi{xb,Xc,Xd)dXb / f2{Xa,Xc,Xd)dXadXcdlJ.{xd) 


’Ga 


- / / f{Xa,Xb,Xc,Xd)dXadXbdXcdlX{xd) 

JGonGc JGb JGa 

= f [F{Gb\Xc = Xc,Xd = Xd)'\?{GA\Xc = Xc,Xd = Xd) 

J GdHGc 

- ¥{GAr\GB\Xc = Xc,Xd =Xd)]dxcd}x{xd). 

This contradicts the independence assumption 11 Xg | Xc,Xo, and therefore we have that F{E) = 1. That 
is, with probability one 

fl{Xb,Xc,Xd)f2{Xa,Xc,Xd) = f {Xa,Xb,Xc,Xd) ■ 

A similar factorization holds for /i and /a. 

Because of this, with probability one it is the case that 

f{Xa,Xb,Xc,Xd) 


f2iXa,Xc,Xd) = 


fl{xb,Xc,Xd) 

h{Xa,Xb,Xd)f\{Xb,Xc,Xd) 


fl{Xb,Xc,Xd) 

= h{Xa,Xb,Xd). 

Thus almost surely /a and /a only depend on Xa and Xd, so we may write f 2 {xa,Xc,Xd) = f 2 {xa,Xd), and 
similarly for /a- Using this we obtain 

f{Xa,Xb,Xc,Xd) = f]_{Xb,Xc,Xd)f2{Xa,Xc,Xd) 

= fl{xb,Xc,Xd)f2{Xa,Xd) 

with probability one. Thus Xa 11 {Xb,Xc) \ Xo, completing the verification of (IIP). □ 

Details of Example |6.7[ Under these assumptions on the Xj, we claim that 

Var(X„)< £(1-5)2^ 


k=0 


The base case n = 1 is trivial, since Var(Ai) = 1. For the induction step, we have 


N 


Var(A„+i) = l+Var(£j8„yA„_,-) 

y=i 


N 


< 1 + (£ |j8„y|)^ max Var(A„ 


;=i 


l<j<N 


-}) 


< 1 + (1 —5)^ max Var(X„_,) 

l<j<N 


' H—1 


(B.24) 


< i+(i-a)M £(i-a) 


2k 


yk=0 


Ki-'?) 

k=0 


2k 
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where we have used the induction hypothesis to obtain line ( |B.24[ ). From this, we may conclude that for all 
n, 


Var«,)< |(l- 5 )“ 


For n> m we have 


N 

C0V{X„,X„,) = CoviY, PniXn-j,X,n) 

j=i 

N 

;=i 

< (1 — 5) max Cov(X„^j,X,n). 

i<j<N ■' 


By iteratively applying the above inequality, we may keep decreasing the subscript on the first X variable 
in the previous line until the subscript reaches m, and we gain a factor of (1 — 5) each time. Since decreasing 
n in this iterative manner until it is at most m requires at least {n — m) /N iterations, we obtain 

n—m 

Cov{X„,Xm) < (1-5) 'V max Cov(Xm+j,X„,) 


(B.25) 


. n-m , . 

< (l-5)^Var(X„,) 

a n-m 1 

£ 


\n-m\ j 

Thus, we may set go{n,m) = (1 — £)^^ and note that 


M oo /, \n-k\ + \k-m\ 

, , (1-5) N (1-5) N 

gi{n,m) = 1 ^ - + Y- - ^ - 

^ ti ^ 

( (1-5)^ “(1-5)^' 

_ oU\n-m\+l) g +Y 5 


= O — m\ 

and more generally the same argument gives 


(1-5)^ 


(B.26) 


gr{n,m) = Or \ \n — m\ 


^1-5) A' 


For these gr, the requirements of Theorem |6.2| are clearly satisfied, so all fhaf remains is fo demonsfrafe fhe 
required eigenvalue bounds. 

The maximum eigenvalue of fhe mafrix £„ = is bounded by ifs maximum row sum, which is 

bounded by 


n 

max y 

i<i<« “ 


(1-5)^ 

5 




k=0 


5(1-5)!/^ 


<c 


Finally, we verify an upper bound on fhe eigenvalues of the inverses of the covariance matrices. Let 
Oij = Co\{Xi,Xj), let £„ = {Oij)ij<n, and define = {a'j)ij<n. 

As discussed in Secfion 5.1.3 of lfT4ll . for any ij <nwe have 


a" = Var(X|X_,y', 
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and 

giJ _ Cov{Xi,Xj\X_^ij}) 

^Var(X,|X_{,,.})Var(Xy|X_{,,.})’ 

which is a partial correlation and so is bounded in magnitude by 1. (Recall the notation 


and similarly forX_|, y}.) Thus 

< V 

< ^Var(X,|X_;)-i Var(Xy|X_y)-i. 

We now claim that for any i and n with i < n, the quantity Var(X,|X_;)^^ is bounded above by a uniform 
constant C{5,N). To see this, note that Var(X,|X_,) depends only on for i <k < i + N and 0 < j <N, 
and is nonzero for any choice of the fikj with T!j=i \Pkj\ <1 — 5. Moreover, Var(X,|X_;) is a continuous 
function of jS, and the condition T!j=i \Pkj\ <1 — 5 defines a compact domain for j3. Thus the continuous 
function Var(X,|X_,) of j3 has a compact image which does not include zero, and so for any choice of the 
j8,y, the quantity Var(X,|X_,)^^ is bounded above by some constant C{5,N). Therefore we may conclude 
that \a‘j\ <C{d,N). 

Next, suppose j — i >N. We haveX, G a(Ti, ...,T,), and the Y„ are i.i.d., soX,-11 Ty. Also, X_p' y| contains 


P(Xy < a,Xi < 5|X_p,y}) 


N 


PjkXj—ic 1 £y 1 u,Xi < b 




X 


-{iJ} 


N 


(B.27) 


£j < Cl pji(Xj—i(,Xi < b 

, k=i 

^ N 

Ey < a — ^ jijkXj^k 

, k=l 


X 


-{iJ} 


x-{ij}]m<b\x_{,j}) 


= F{Xj < a|X_{yy})P(X < 5|X_pyy}) 


where we obtain line ( B.27 1 from the fact that £y is independent of for all k < j, and X^ G a(ei, ...,£y_i) 
for all k < j. 

Thus Xj and X are conditionally independent given the other variables, and = 0 when |/ — j\ > N. 
Therefore for any n, the matrix has at most 2N + 1 nonzero entries in any row. Within each row, each 
entry is bounded by C(5,A), so {2N + 1)C(5,A) is an upper bound on the largest eigenvalue of This 
provides a positive lower bound on the smallest eigenvalue of Z„, and concludes the verification of the 
conditions for Theorem |6.2[ allowing us to conclude that these X satisfy (P5*). 

The argument used to show that Xj is independent of X given the other variables when \i — j\ > N works 
even when there are infinitely many other variables. Therefore it verifies (P*) wifh respecf fo fhe graph on 
N which has an edge befween i and j if and only if | / — j\ < N. Thus, by Theorem 2.3 we may conclude 
fhaf fhe collection of variables X safisfies (G*) wifh respecf fo fhis graph as well. □ 

Details of Example |6.8( We demonsfrafe fhaf fhe desired conditions are satisfied for fhese covariance 
funcfions, beginning by verifying fhaf fhe a,y are diagonally dominanf for all sufficienlly small V (depending 
on a). For any fixed p = c{i) G Z”’, and any choice of a and V, 

^ exp(-<i(p,^)“/y) <1 + 2^ (2n + l)'"exp(-n“/y) < 


oo 
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and moreover for all q p, the quantity exp{—d{p,q)/V) tends to zero as V tends to zero. Thus, by the 
dominated convergence theorem, 

(B.28) Y.\^ij{V)\ < X eM-dip,q)/V) ^ 0 < 1 = au{V). 


Because the sum above (as a function of V) is independent of p, we may conclude that for V sufficiently 
small (again, depending on the fixed choice of a), the resulting covariance matrix is diagonally dominant. 
We omit the verification of the bounds on the gr for these covariances, but note that the verification is 


similar to that used to obtain line (B.261. 


Thus, for sufficiently small positive V, the Gaussian processes with covariance functions given by 


Oij = exp(-r/(c(/),c(i))“/y) 


satisfy the requirements of Theorem|6.2[ and so satisfy (IIP) and (DCP), and hence (P5*). 


□ 


Proof of Example 7.3 For this proof, we will apply Theorem 2.8 For any finite length m sequence 


(/i, ...,/m) disjoint from b\,...,br, we have 


m 

- n min (P(X,-^. =y,X,v+i =z)) 


j= 

=: Cl > 0 . 


Next, suppose that n G N and B C N is arbitrary, and that A is an event in ...). First, 

suppose that B is infinite. Then, for any m, there is some b with b> m,so for all sufficiently large n we 
have m <b <n. Thus 


F{A\Xu...,X^,Xb = xb) = F{A\Xb = xb) 


is constant, and so 


Var{F{A\Xu...,X,n,XB)\XB=XB)=0 

for all sufficiently large n. On the other hand, if B is finite, we have that there is some maximal b € B. If 
b>m, the above argument implies that 

Var (P(A |Xi,.. .,Xm ,Xb) |Xb = xb) = 0 

for all sufficiently large n, and so we may assume m> b. Since the Xi form a Markov chain, and since X,„ 
can only take values in {0,1}, we have that if n > m, 

War{F{A\Xi,...,X^,XB)\XB = xb) 

= Var(P(A|Xm,XB)|XB=XB) 

(B.29) < \F{A\Xn, = l,XB=XB)-F{A\Xn,=0,XB=XB)\. 
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For any arbitrary m<r have the bound 

|P(A|X, = 1,Xb = xb) - P(A|X, = 0,Xe = xb) \ 

= \¥{A\Xr+i = 1,Xb =xb)P(X,+i = = 1) 

+ P(A|X,+1 = 0,XB=XB)F{Xr+l = 0|X, = 1) 

- P(A|X,+1 = l,XB=XB)F{Xr+x = 1|X, = 0) 

- P(A|X,+ 1 = 0,XB=XB)F{Xr+l = 0|X, = 0)1 

= |P(A|X,+1 = 1,Xb =xb)(P(X,+i = 1|X, = 1) -P(X,+1 = 1|X, = 0)) 

+ P(A|X,+1 = 0,XB=XB){F{Xr+i = 0|X, = 1) -P(X,+1 = 0|X, = 0))| 
= \F{A\Xr+l = \,XB=XB){Pr-tr)+nA\Xr+l=0,XB=XB){tr-Pr)\ 

= |(P(A|X,+ i = l,XB=XB)-F{A\Xr+l=0,XB=XB)){Pr-tr)\ 

= \F{A\Xr+l = \ ,Xb = xb) - F{A\Xr+l =0,Xb = XB)\\pr-tr\. 


Thus, by induction, we have that 


|P(A|X„, = l,XB=XB)-F{A\X,n=0,XB=XB)\ 

n—2 

< |P(A|X„_i = l,XB=XB)-nA\^n-l=0,XB=XB)\ Y\\Pr-tr 

r—m 

n—2 

< n iFr-O-l, 

r—m 


and so by line (B.291, we have that 


\ixr{F{A\Xi,...,X,„,XB)\XB=XB) < H 




Since ^„(1 — (p„ —1„)) = oo, we have that n”=m \Pr — tr| —^ 0 as n —)• oo, by the usual arguments involving 


the logarithm. We may now finish the proof by invoking Theorem 2.8 
The next two results are used to prove Proposition |A^ 


□ 


Lemma B.6. Suppose given a graph '^ = {V,E) with V = {0} U N and parameter set 0. If the limit 


\imr:{{Xu...,X,n) = v) 

n-^oo 

exists for all v G {0,1}”’ and all m G N, the infinite Ising model distribution from Definition \7.8\ exists and is 
unique. 


Proof Note that for all finite n we have 

£ P-((Xi,...,X„) = v) = l, 

vsfO.l}'" 


and so 


£ r’'((xi,...,x^)=v) 

vSfO.lp’ 

= £ limP™((Xi,...,X„,) = 

= lim £ P™((Xi,...,X„) = 


= lim 1 = 1. 

n^oo 


v) 

v) 
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Thus the distributions F'” are actually probability distributions on {0,1}™. Moreover, by the properties of 
marginalization, for nii < m 2 , we have for all sufficiently large n, and any v G {0,1}"*', that 


= V) = V), 

and so 


and so the marginalizations of the P™ are consistent. Thus by the Kolmogorov Consistency theorem, there 
exists a unique joint distribution P on {0,1}°° with finite-dimensional distributions given by the distributions 
P™ and their marginalizations. □ 


We now state a more explicit condition which can be used to verify whether the limit in line ( |7.3[ ) exists. 

Proposition B.7. Suppose given a graph = ({OjUNjf') and parameter set 0. For v G {0,1}'”, and 
n>m, define the function 


(B.30) 


fm{v,n) := 


Lxg{ 0,I}"-"> SXp ifLi,j>m + llj>m ^Oj^j) 

where for X G {0,1}”^'" we have labeled the first coordinate as and the final coordinate X^, and for 
ease of notation we have used vq = 1. 

Then the limit in line (7.3) exists and is nonzero for all m and v if and only if the limit limn^oofmiv,n) 
exists and is nonzero for all m and v. 

Proof We have 


(B.31) 


fm{v,n) = 


P-((Xi,...,X„,) = (vi,...,v,„)) 


P»H(Xl,-,^m) = (0,...,0)) • 

From this, it is easy to see that if the limit from line ( |7.3| ) exists and is finite and positive for all m and v, 
then the same holds of lim„^oo/m(v,n). 

For the other direction, note by line ( |B.31| ) that for any n we have 

P”'((Xi,...,X„) = (vi,...,v,„)) =/^(v,n)P™((Xi,...,X,„) = (0,...,0)). 

Let Fm{n) = Lv£{o,i}”/m(v,n). Then P”'((Xi, ...,Xm) = (0, ...,0)) = l/F„{n). Assuming that/m(v,n) has a 
finite, nonzero limit as n —)• 00 for all v and m, we obtain that Fm{n) does as well, and so 

lim l/F,„{n) = \imF:\{Xu...,X^) = (0,...,0)) =: P'”((Xi, ...,A,„) = (0,...,0)) 

n—foo n—>00 

exists and is nonzero as well. Finally, note that 

P'”((Xi,...,X„,) =v) := limP“((Xi,...,Xm) = v) = lim f„{n,v)/Fmin) 


exists and is nonzero since fm{n,v) and Fm{n) have limits as n tends to 00 . 


□ 


Proof of Proposition |A.3[ If the infinite Ising model is well-defined wifh P((Ai, ...,A„,) = v) > 0 for all 
m,v, fhen by fhe definition of fhe infinife Ising model fhe quantify 


fm{v,n) 


F':{{Xu...,X^) = {vu...,Vm)) 
P-((Ai,...,A^) = (0,...,0)) 


has a nonzero limif for all m and v, and so fhe equivalenf expression from line ( |B.30| ) does as well. For fhe 
ofher direcfion, combine Lemma [Bj6] wifh Proposition |B. 7 1 □ 
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